We just raised a $30M Series A: Read our story

Grooper OverviewUNIXBusinessApplication

What is Grooper?

Grooper was built from the ground up by BIS, a company with 35 years of continuous experience developing and delivering new technology. Grooper is an intelligent document processing and digital data integration solution that empowers organizations to extract meaningful information from paper/electronic documents and other forms of unstructured data.

The platform combines patented and sophisticated image processing, capture technology, machine learning, natural language processing, and optical character recognition to enrich and embed human comprehension into data. By tackling tough challenges that other systems cannot resolve, Grooper has become the foundation for many industry-first solutions in healthcare, financial services, oil and gas, education, and government.

Grooper Buyer's Guide

Download the Grooper Buyer's Guide including reviews and more. Updated: November 2021

Grooper Customers

Oklahoma DOT, Mercy Hospital System, OLERS, Oklahoma State University, Change Healthcare, U.S. Nuclear Regulatory Commission, American Airlines Credit Union

Grooper Video

Pricing Advice

What users are saying about Grooper pricing:
  • "Overall, their pricing is higher than the competitors, but they offer functionality that is otherwise not available."
  • "Know how many pages you will be needing to process, as the pricing is based on that."

Grooper Reviews

Filter by:
Filter Reviews
Industry
Loading...
Filter Unavailable
Company Size
Loading...
Filter Unavailable
Job Level
Loading...
Filter Unavailable
Rating
Loading...
Filter Unavailable
Considered
Loading...
Filter Unavailable
Order by:
Loading...
  • Date
  • Highest Rating
  • Lowest Rating
  • Review Length
Search:
Showingreviews based on the current filters. Reset all filters
JG
President and COO at a computer software company with 51-200 employees
Reseller
Top 20
Good data ingestion and classification capabilities, supports various media types and formats, and the interface is easy to use

Pros and Cons

  • "The user interface is easy to use, and the flexibility is noteworthy."
  • "Technical support is definitely an area that they need improvement in, in terms of the front-line individuals."

What is our primary use case?

We are a reseller of this solution and have implemented it for a couple of our customers. In addition, we also use it as part of our own product.

Our customers use it as part of an on-premises accounts payable solution, whereas we utilize it within our own cloud solution that is used for mortgage classification and data extraction of mortgage documents.

How has it helped my organization?

Grooper allows us to automate data extraction and integrations, and we have done so in our own cloud solution. We have APIs for integration with loan origination systems, customer portals, and other proprietary systems. In our process, we have clients that post documents to our API. An example of such might be a 300-page PDF file. We ingest that through Grooper and we'll classify the documents, extract all of the data, and then we'll either post back all of the documents and data back to the customer using a return URL or we'll make it available so that they can call another endpoint and download all of the information from us. That whole process is totally unattended, with no human intervention whatsoever. This ability for this to take place automatically is almost a number one factor for us.

When it comes to processing difficult source data with both unstructured and semi-structured content, it does very well from an ingestion standpoint. To begin with, there are different methods on how we can get those documents. We can ingest documents that come from, for instance, an SFTP site, a file system, or right from email accounts like in Exchange.

One of the nice things about it is, for instance, if it's a file that somebody scanned and produced, such as a PDF file, and we're going to classify and extract data from that, that's great. But, if we receive electronic files, such as an XML file, a text file, an HTML file, or a searchable PDF, those are considered electronic documents, meaning that the data is embedded within it. In those cases, Grooper will allow us to extract the data right from the electronic file itself, so that we don't have to convert it to an image to then turn around and OCR to then try and get data from it.

That is a huge advantage, as pretty much every other OCR system that is out there will take that electronic file, convert it to a TIFF file, OCR it, and then extract the data from the image file. This roundabout process is susceptible to the quality of the document whereas if it's an electronic file with Grooper, the data we extract will always be a hundred percent because it's being pulled directly from the electronic file.

Grooper allows us to consolidate mass amounts of data that would otherwise require a person to go through, page by page. When you set up what's called a data model, you can group fields into sections. As an example, consider a typical invoice. You have header data, footer data, and then you might have a line item table that has all of the individual line items. These make up the unit price, quantity, and line price, which then totals to your subtotal, tax, and freight, which equals your invoice total.

In most systems, you define all of those fields, so when you look at the information and when the user has to fix things, it's all sequential. With Grooper, we can create a section for the header, a section for the table, a section for the footer, and then group those fields together. This means that when the user is presented with the data, side-by-side with the actual image document, it's very intuitive because the data gets presented pretty much in the same manner that it is on the actual document. It helps speed up the amount of time that a user would take in order to make corrections.

In certain types of jobs, I would estimate that using Grooper saves us 70% of the time it normally takes to complete it.

Using this solution has helped to reduce the number of people involved in data extraction and classification. As an example, our largest healthcare customer processes 2,000 invoices a day, and they had 75 AP clerks who were doing data entry into PeopleSoft. Last year, we implemented a Grooper process where we automatically ingest the invoices from an email, classify them, then extract the data. We also do all of the validations for their PeopleSoft system. The number of people went from 75 down to 14.

This company has more than 3,000 suppliers and not all of them were set up before they went into production. Since that time, there's been an effort where every week, as they bring on new suppliers through the automated process, they continue to provide my team things that need to be tweaked or introduce suppliers that we hadn't seen before and need to be added. One type of document they receive from a supplier like me is a direct invoice, which is something that they will approve automatically after receiving it in an email and after the GL coding and other aspects are verified. The last statistic I saw on the automatic processing of direct invoices is that 62% of those are going through without human intervention.

What is most valuable?

The classification feature is very good. That's the initial reason why we switched from the other product that we used to resell and then decided to utilize it within our own product. This feature doesn't require a bunch of samples like the previous technology that we utilized.

Previously, for instance, if we were classifying mortgage documents or bank statements, I had to get three or four representative samples of all of the bank statements that are out there in the country. With thousands of community banks, it's almost impossible to get all those samples. As such, we always had an issue with being able to classify a bank statement.

However, with Grooper we didn't even use samples. Instead, we put in what's called positive extractors that look for certain keywords or characteristics of what makes up a bank statement. By doing it that way, we were able to classify probably 98% of all bank statements without ever having received a sample of each.

The second most valuable feature is extraction accuracy. That was an add-on bonus for us because initially, we were just doing classification, and being able to do more accurate extraction opened up another revenue source for us. We were able to add on the extraction capabilities to our classification and so now, pretty much everybody that we talk to wants not just classification, but they want extraction. Furthermore, when they see the accuracy of the extraction, everybody's very happy.

Grooper can extract from and ingest pretty much every image file type. It can handle TIFFs, JPEGs, PNGs, BMPs, basically all image file types, PDFs, all of the Office docs including Word, Excel, PowerPoint, Text files, XML files, and more. There's no limit on which file types they can process.

The data output and reporting are fully customizable. We have total control over what data we extract and have that included in an XML file. Grooper has a couple of export modules to allow you to export that XML data raw, it can do XSLT conversions to reformat it in a different manner if we have a specification for that, or we can output that to a database. For database output, we can have it inserted into the tables and fields in the way that we want them.

Grooper does not necessarily do the actual reporting, other than internal reporting as far as statistics like the batch state, how many batches, where they're at, if there are any errors, and that kind of thing. But, in terms of extraction data reporting, we do have the mechanisms to export all of the data to either a database, XML files, and other formats. We will take it from there and load that into whatever system we're going to do the actual reports in.

The user interface is easy to use, and the flexibility is noteworthy. Because of the way the system is architected, different people can follow different approaches and get the same result. For example, there are three of us in my company that are trained on Grooper. If each of us were to do the same project, the chances are that each of us would do it differently. Depending on how you think and how you would set things up, such as the extraction and the order that you want to do things in, it could differ based on these. However, the outcome would always be the same.

That's one of the nice things about it because it's not like, "Okay, you only can do it one way." Rather, you can do it in different ways. Some people don't like that, because they want to be taught using a fixed sequence like, "Okay, you do A, B, C, and D, and then you get your result." The system is flexible enough that I may do step D first and then A, and then C and then B and still get the same result.

From a user interface perspective, most things are available via drop-down menus, you can select references, and point back to your extractors, and other things like that. From a GUI perspective, it's very effective.

What needs improvement?

Currently, we're still using version 2-7-2, and now they're about to do the beta release on their version 2021. In this coming version, we expect that some of our issues will be fixed.

We've had challenges in classification tasks where similar documents were flagged as multiple matches. The system would identify them and say, "Hey, I think I've got multiple matches. It could either be this one or that one." Because of that, it required us to instruct the system to either leave it unclassified, or we had to halt the process for somebody to look at it.

With the new version for 2021, they have changed the paradigm. As it is now, we're using something called a form type, where pages within the document are referenced using a specific page number. For example, in a ten-page document, you might refer to information specifically on the first or fifth page. In the new paradigm, there is a first, middle, and last page concept, as opposed to having the different form types with all of the different pages. What they're telling me is that it's going to make the classification more accurate. Just because the first page of two different documents looks the same, they will not be considered duplicates. Having multiple points of reference will now allow it to better distinguish them.

The other area we have had challenges with is table extractions, where if the data headers were not defined, or the tables did not have descriptions for the columns. My understanding is that in the 2021 version, they've now shown that they're handling that. Again, we don't have it and haven't been able to test it, but it's coming.

Technical support is definitely an area that they need improvement in, in terms of the front-line individuals.

For how long have I used the solution?

We have been using Grooper for two and a half years.

What do I think about the stability of the solution?

Grooper is a very stable product. As I mentioned, our cloud solution doesn't have any human intervention and I've got one support person that monitors things, other than our automated tools where it monitors services and stuff like that. I just have one person to ensure that there aren't any errors or other such problems. Errors arise occasionally, for example, if we get a corrupt image or somebody sends us a document that has security on it. From Grooper, itself, we've not had any issues with it crashing or hanging.

One of the huge advantages is that Grooper supports a pool of computing resources, which means that if one of our servers goes down, the licensing server detects that and adaptively changes the workload. Specifically, it will not send any new work to that device because it's not online. It will just continue to distribute the work amongst the others that are available. When it comes back online, then it'll start giving it work, automatically. It's a very nice feature to have, to be able to distribute that work across multiple resources.

What do I think about the scalability of the solution?

You can scale Grooper as much as you want. You can literally add as many servers as you require. If you're in a virtual environment, you could spin up a bunch of VMs, install Grooper on them, add those into the thread pool, and just tell it about them. They can now participate in the process.

Spinning up a VM and getting it prepared, including installing the software and adding it to the thread pool, can be done in about 20 minutes.

In our cloud solution, we maintain a service level of 10 minutes or less, from the time we receive a file from a customer to the time we deliver it back. Our average is four minutes. Early this year, we were starting to get that towards eight minutes because we were increasing volume. We literally just called our cloud provider and asked them to enable another server for us. We installed the software, added it into the thread pool, and we now are handling 30% more volume and we're back down to that four-minute turnaround time.

It really scales.

How are customer service and technical support?

The only time that we reach out to them is when we encounter issues like bugs. Sometimes we'll find something that doesn't look right, so we'll submit a ticket and have somebody review it. I will say that's probably one area that they would definitely need some improvement in, particularly with the front-line individuals.

When we submit a ticket, usually they'll ask all of the basic things, as well as request we send the logs and other relevant data. They'll go down the checklist. Specifically for our company, because we're a reseller and we know the product very well, we have already done all of these things. I know it's probably standard protocol, but I think they should train the individuals to know the difference between a regular customer who just implemented Grooper and our organization, who's an actual reseller and has implemented their solution, as well use it internally. It's a waste of time for them to ask all those things because we know that there really is a problem, and want to get on to solving it.

Once it gets beyond the first level, on to the engineering team or the development team, they have been very good and responsive about providing fixes and patches. From that aspect, I don't necessarily have an issue. It's more just the first level of support.

We don't have the level of support that would give us an assigned engineer. In a couple of cases where we ran into some issues that were more urgent, I reached out to our account managers. At that point, he got in contact with the product manager and they called me right away. They were able to get some people on the phone and handled it immediately, but there isn't a designated engineer for our account.

Which solution did I use previously and why did I switch?

We used to resell and implement another product prior to Grooper.

The most recent one we have worked with is the Ephesoft product. We're still a reseller of Ephesoft, technically, and have been for approximately seven years. We actually adopted version one, seven years ago or so. We've got perhaps 25 implementations, who are customers that we still support today.

As an example, I mentioned our cloud solution for mortgage classification extraction. I tried to build that three years ago with Ephesoft, but it just didn't lend itself to it. For one, the accuracy level wasn't there. The problem is that we would need to have representative samples of every document. We've got over 1,500 distinct documents in that model, so trying to find 20 samples of 1,500 documents would just take forever. The other problem was that there were large limitations on the extraction side, as far as table extractions. Even to this day, they still have issues with that. It is important to remember, however, that we used it because it was the best thing we had at the time.

Before that, we used PSIcapture, which is a PSIGEN product. We used that for between four and five years before we switched to Ephesoft. Of course, we used Captiva (now known as OpenText Intelligent Capture), and IBM's offering as well.

I can tell you that previously, with the last product that we used to resell, setting up that accounts payable system for the healthcare organization that I have described probably would have taken us six months. With Grooper, we were able to get the entire product all done, with the integration, in six weeks.

It's such a big improvement because there's just so much more that's out of the box. With the other product, we had to do a lot of scripting and write services around it in order to get data into it, and once we got the data back out, we had to do a lot of other stuff too. Whereas with Grooper, there's just so much functionality within the product itself that we don't actually have to write all those things.

In terms of the learning curve between products, from a training standpoint, Grooper is definitely more involved. With the other products, you could go through a two-day class and learn enough to be able to get started. With Grooper, you're going to spend a minimum of a week. Ideally, you should take the other classes as well. So, it's essentially a two-week training period, and that's assuming that you have a capture background.

By "capture", I mean that you should be familiar with scanning, image processing, all of the capabilities with respect to cleaning up images and OCR, and things like that. It is more involved because there's just so much more functionality within the product. Whereas the other products have a very simple user interface, but then you're very limited on what you can actually do.

One of the big benefits to Grooper, and one of the reasons why I switched our company away from Ephesoft, is that Ephesoft is licensed based on a number of cores. If you look at an entry-level four-core server, you can process 20 pages a minute. The time is consumed with OCR and the licensing that they use. When you do the math, you realize that you can do a couple of million pages in a year. If that server is running 24/7 and you were processing non-stop, it would process two million pages in a year.

Well, if all of a sudden, you need to do more volume, but in a shorter time, you have to add more servers and more cores. Of course, now you have to buy much higher licenses and then it just starts escalating from a cost standpoint. The way Grooper works, it's licensed based on the number of pages per year and they don't care how many resources, from a server perspective, you deploy.

In our case, as an example, we brought on some new customers to our cloud solution. What we did is we just added more servers, made those servers available into what's called a thread pool, and now Grooper started distributing work across multiple servers, all without it affecting my license at all. You can actually do what's called crowd computing processing.

In an organization, you could install Grooper on perhaps 50 desktops and then add them to the server. You would tell the server that these 50 computers are out there on the network and are available. Assume that each had four cores. What Grooper will do is to monitor through the day and night and determine whether any of those resources are available. It'll send them tasks automatically and lets those computers do the processing and offload some of the work. Because of that, we're able to get stuff through really fast. We could split up, for example, a batch of 300 pages, maybe across 20 computers. These don't have to be servers; rather, they can be desktops that are not being heavily used at the time. Now, we can process all of those tasks in a matter of seconds.

Not only is the work done more quickly, but the redundancy created by the pool of computing resources adds stability to the workload.

How was the initial setup?

With respect to setting Grooper up, it's straightforward. Where the complexity comes in is, figuring out how we're going to integrate it with the customers' systems. It's not necessarily a Grooper issue. It's really more on the client-side.

What was our ROI?

We have certainly seen a return on investment from this product.

We had a soft launch in 2019, but in 2020 is when we actually launched our mortgage debt platform. For us, this opened up a whole new revenue stream that wasn't there before. Also, when we look at what we're paying compared to our revenue, it's a fraction of the cost because what we're doing is something, really, that nobody in the industry could do before. As such, we're able to charge a higher premium per file than others in the past.

As an example, let's assume that somebody was charging $3 a loan. By contrast, we're charging $5 a loan, but we can justify it because we can automatically, without any human intervention at about a 98% to 99% accuracy level, process their documents and get it back to them within four minutes.

It is similar to a situation where you can buy a car, and you can choose either the Toyota Camry or you can buy the Lexus. Generally, you're going to step up and get more value for your money. It's going to cost you more, but you're going to be driving a Lexus, which is a much nicer car.

Our customers are also saving a lot of money. For example, the one customer we process 25,000 loan files a month for, is saving about $1.5 million a year, just on labor.

As an example, this same customer asked us to put in a process for them to try and mitigate fraud. The reason is that a couple of years ago, they got caught where a title company sent them instructions for a wire transfer via email. In midstream, somebody intercepted the communication and they changed the routing number and the account number, then they bounced it back to the company like the email was undeliverable.

The company then called and said "Hey, I got this bounced back." and they responded to say, "Well, I don't think we're having problems. Go ahead and resend it." In turn, they forwarded that same document. Once that loan had closed, the money was wired to an account in Russia and it was a $575,000 loan. Consequently, the company got frauded out of $575,000. What they did was to put a process in place where they would have people checking the system to find out if more than one wiring instruction was added into the repository, and then somebody had to go and look at that.

When all of this happened, they asked us to write a program that checked those loans nonstop. Their volume is very high, at about 100,000 loans, so it would take a week for our system to cycle through them. Then within that week, they would get between 2,000 and 2,500 emails that we would have to look at because all we could tell was that there were two documents in that placeholder. We didn't know if they were two wiring instructions because at the time, we were using Ephesoft and we couldn't make that determination. The only thing we knew is that there were two documents.

Because of the necessity to check so many emails, they had approximately 18 people looking at them. It had to be done immediately because the loan is about to be funded, and they don't want to fund it before it's verified, otherwise, they run the risk of fraud again. Now that they are using our cloud-based classification extraction platform, they inquired about how our process could be further improved.

My suggestion was that we can do the same type of monitoring, but utilize an API rather than an SDK, which is much faster. Using the API, we can filter out specific loans, so instead of looking at 120,000 loans, we can look at perhaps 30,000 loans that are really active. Once we find more than one document, we'll pull those documents down, automatically ingest them, classify them, extract the relevant data, and we can now recognize, for example, that I have one document that is a wiring instruction and one that is not a wiring instruction.

In that case, I don't need to send an email because there's no issue. The only instance where we need to be concerned is if there are two wiring instructions, but the routing number and account number are different. If they are the same then it doesn't matter. However, if we find that there are however many, but one happens to be different, now I need to alert the client.

As a result of our newest implementation, our client is receiving perhaps eight emails each day, instead of hundreds of emails a day. Now, those 18 people can focus on their normal job, as opposed to having to go in and do research on these loans when really, once they get in there, 99% of the time there's not a problem.

What's my experience with pricing, setup cost, and licensing?

The way it's licensed is on an annual per-page basis, which is something that I don't see as an issue at all. Overall, their pricing is higher than the competitors, but they offer functionality that is otherwise not available. The way we justify that to the customers when we're implementing is to have them look at the additional functionality. If a competing product is cheaper but it can't do the job, it doesn't really matter.

From my perspective, when I talk to our prospects and customers, I explain that it does no good to compare prices. Some of them will compare Grooper to Ephesoft and point out that Ephesoft is whatever percentage cheaper, say 15% or 20%. In response, I explain that Ephesoft can't do what they are asking to be done, so it doesn't matter if it's 100% cheaper. It can't do it, so you have to think of things in a different mindset at that point, aside from the licensing aspect.

There really isn't anybody else that I'm aware of that's on their level, so I think they can command it. When somebody else comes along that can do the same things that they can do, then I think at that point the pricing will probably get adjusted.

Which other solutions did I evaluate?

We have been in this business a long time and have tried a variety of other products.

What other advice do I have?

We've written code for Grooper, although it has been utilized primarily for validations, rather than for the actual extraction. The extraction is something that we've pretty much handled all through the user interface. However, once we pull pieces of information and we want to validate that to a third-party system or an external database, for example, we have written our own scripts to take the extracted data. The operations will be things like a database lookup, performing validations, pulling back some more information, and then updating additional fields. But for the extraction itself, we really have not had to write code.

The fact that extracting data didn't require scripting was not a deciding factor for us. However, it is an important factor because most people want to have a business analyst support the process, rather than having to hire a developer.

I would rate this solution a nine out of ten.

Disclosure: IT Central Station contacted the reviewer to collect the review and to validate authenticity. The reviewer was referred by the vendor, but the review is not subject to editing or approval by the vendor. The reviewer's company has a business relationship with this vendor other than being a customer: Reseller/Implementer
Flag as inappropriate
DG
Senior Consultant at a tech services company with 11-50 employees
Consultant
Improves the quality of the documents that we're processing and reduces the turnaround time it takes to process these documents

Pros and Cons

  • "Grooper processes difficult sorts of data and unstructured or semi-structured content very well. It's probably one of the better solutions I've seen compared to other solutions I've seen out there. It does a lot more things like segmentation extraction. It does it a lot better. Grooper has more focus on these types of freeform documents where other solutions are very generic and this is a little more elaborate in what they've done. I think they take it to the next level of extracting freeform data."
  • "Grooper is new. It's new beta stuff, so we've had some issues, but that's understandable. Getting the beta product to more of a true release is where it needs improvement. I'm going through training now, so it's hard to judge what they have and don't have until I get through that training. Training is the main thing for me because I'm trying to learn and take things I've learned from other products and try to transfer that knowledge to this one."

What is our primary use case?

We use Grooper for itemized bills. We're doing it to extract many lines of data in a very freeform aspect, capturing columns and rows and calculating the itemized bill amounts to basically affirm that the data on the iBill is correct.

Right now, we're focusing on PDFs. We're adjusting that. We're converting it to TIF.

How has it helped my organization?

It enables us to automate data extraction and integration. That's very important because it helps us with improving the quality of the documents that we're processing, and it reduces the turnaround time it takes to process these documents.

It's going to really improve our organization in processing documents, where today, they're being full keyed completely manually, and we're going to hopefully get rid of 90% of the labor or the full keying aspect of it.

Grooper enables us to consolidate mass amounts of data that would otherwise require a person to go through page by page. It has affected the turnaround time and quality. We have issues today when people try to key it and we still get problems and we won't find out until two days later. Now, if there's a problem, we find out immediately. We know where to go fix it and we get it out the door very quickly.

In the beginning, there was a need for a software developer to write code to configure extract jobs within the solution because we needed somebody to help with professional services to get us going.

It will reduce the number of people involved in data extraction and classification in my organization.

What is most valuable?

The extraction is the most valuable feature. That's the part that really is the heavy lifting of what we do. We have to either extract everything accurately or we're going to have to full key the complete itemized bill, which can take days. So now we're doing stuff that takes literally minutes, or even sometimes seconds, and we've reduced the labor by a hundredfold.

Grooper processes difficult sorts of data and unstructured or semi-structured content very well. It's probably one of the better solutions I've seen compared to other solutions I've seen out there. It does a lot more things like segmentation extraction. It does it a lot better. Grooper has more focus on these types of freeform documents where other solutions are very generic and this is a little more elaborate in what they've done. I think they take it to the next level of extracting freeform data.

Using the GUI-based application to configure extract jobs is better now. I've just gone through some training on this and it was like drinking from a fire hose. To be fair, once I got into it, it is a lot easier than it was. In the beginning, it could be a little difficult to figure out where everything is, just because it's a new environment, but it's pretty medium in how difficult it could be.

It enables us to modify the output. It's very important to us because right now we're extracting data before we have it sent to someone to look at it to correct any issues, and after so we can do comparison data.

The data classification abilities are very good. Using the keyword labeling, entering keywords, and the way that they could do it in different layouts as well is good. It could be horizontal, vertical, or it could just be an and/or-type situation with the keywords to identify different documents.

What needs improvement?

Grooper is new. It's new beta stuff, so we've had some issues, but that's understandable. Getting the beta product to more of a true release is where it needs improvement. I'm going through training now, so it's hard to judge what they have and don't have until I get through that training. Training is the main thing for me because I'm trying to learn and take things I've learned from other products and try to transfer that knowledge to this one.

But from what I've seen so far, it does very well. In the beginning, it was very frustrating because I didn't know much about it, but now, as I'm getting more into it, it's not as bad as I thought it was going to be. I'm starting to use it better. I'm able to configure things really quickly. I see this as a really good product for us in the long term.

There should be more detail on how things were done, but because of how some of the things are being extracted, it's hard to judge that because it may be already out there. It's not a fair statement to say it's lacking in that area.

The stability of the environment needs improvement because it's new and they had some hiccups, but we got through it.

For how long have I used the solution?

I have been using Grooper since January.

I've been involved since January, but we went live with it in March and April. We did medical records first where we were getting OCR documents sent back to a customer. We're not doing bookmarking yet because that's something we have to figure out on the layout how we want to do it. We only went live with the actual extraction piece in April, but we couldn't turn it on fully. We can't turn it on completely because we still use other vendors to do the processing right now.

What do I think about the stability of the solution?

Stability is good now. I rarely have any issues. Sometimes we'll have some issues with the REST. Like today, we had an issue with the OCR. It OCRed one of the images and they found out there's an issue with one of their processes that they're sending to the build team to fix. But they responded pretty quickly and they had a workaround for us and got it through.

What do I think about the scalability of the solution?

I believe it's very scalable. I have multiple machines running against the environment. I have many threads running against the environment as well.

There are a couple of users right now and they're document specialists.

Deployment and maintenance require one FTE. Once it's deployed, it's fine. I'm going to still be monitoring and adjusting things as needed, but one person is needed overall.

Right now we're processing a couple of projects. One's doing OCR only, just to OCR PDFs. Another one is processing those documents that have many service lines on them.

We're expected to go full volume in probably the next couple of months on all of it.

How are customer service and technical support?

They have good technical support. They respond pretty quickly and they jump on with me when needed.

How was the initial setup?

The initial setup was complex because we had some issues. We had bugs that we had issues with, and licensing issues at the beginning of what we were trying to do. We couldn't have two separate environments.

Grooper has more of a learning curve than other solutions. The other solutions I've used have more of a workflow. That would be something I would like to see.  

Grooper should have more of an enhanced workflow, decision nodes, that kind of stuff that's more of a tree-like view where we can actually evaluate different attributes. I'm sure you can do that, but it's probably not as clean as Captiva, for example. But of course, Captiva has been around since forever, so they have a more mature workflow.

That would be something to set up, something where you can actually set up a dashboard, like a work in progress dashboard, maybe be able to make that a little bit better too, versus what I've seen before. But I'm comparing the things that are very specific to what I've needed at other places in the past.

They were working on it for a few months and it took probably four or five months to get it deployed because we had to use professional services to help us get off the ground with the different formats we had to create or the different doctypes. They helped us create about 60 doc types. 

While we were in beta, we were in a dev environment, and then we cut over to production and we had some issues with the licensing because we were trying to figure all that out with the new environment.

Another issue we've had is when we're upgrading, it wasn't as clean. So we accidentally upgraded an environment by accident by updating one machine that was still connected to another and it somehow updated the database. There are some issues there that they could clean up as well. Because then when you're on a different version from one machine, it already updates to SQL, but it says that you're now part of a version and now you can't process anymore. That was a big headache. We had to back things up. 

What about the implementation team?

We had support from them for the deployment.

What other advice do I have?

Make sure you understand OCR and how it's not perfect. It's as much of an art as a science. Just running it through Grooper without putting the right rules and the validation rules unless you do that, you'll be able to use the full capabilities. If you go in just thinking you should do the extraction and go out the door without putting any kind of adjustments or rules or modifying the data with business rules, it's not going to do that well for you. That's with any kind of product with an OCR solution.

Dealing with a beta product, know that there are still going to be issues and issues that no one knows about until they run a lot through or different scenarios through.

I would rate it a seven and a half out of ten. 

Which deployment model are you using for this solution?

Private Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Microsoft Azure
Disclosure: IT Central Station contacted the reviewer to collect the review and to validate authenticity. The reviewer was referred by the vendor, but the review is not subject to editing or approval by the vendor.
Flag as inappropriate
Learn what your peers think about Grooper. Get advice and tips from experienced pros sharing their opinions. Updated: November 2021.
553,954 professionals have used our research since 2012.
reviewer1509882
Data Wizard at P&P Oil & Gas Solutions, LLC
Real User
Top 10
Good image processing and OCR capabilities, lots of flexibility with extractors, helpful support

Pros and Cons

  • "There are many options and customizations that you can make to each individual extractor that allows you to tweak it for exactly what you need."
  • "If Grooper could "sense" important fields on the document and auto-build extractors for them, that'd be really cool."

What is our primary use case?

We use Grooper to extract data from scanned documents, perform data validation, and import it into various databases.

A couple of specific uses are:

  1. Extract data from invoices and validate the item costs against a price list database table, and generate a report for management to discuss issues with the vendor.
  2. Reading through oil and gas leases and related documents, extracting out certain pieces of information and clauses for review by an analyst, and then formatting it for import into the land records database.

How has it helped my organization?

Using Grooper has sped up our data entry process between two and eight times, depending on the exact process.

It has allowed us to expand our service offerings, as well.

We are able to get work done for clients faster with confidence. A lot of things can speed up a process, but few can speed up a process while maintaining or even improving accuracy.

There is some "startup time" to get the new process in place, but we quickly see that time returned by the improved process. We've been able to get very detailed with our process optimizations.

What is most valuable?

  1. There is a lot of flexibility with extractors. There are many options and customizations that you can make to each individual extractor that allows you to tweak it for exactly what you need. You can then create a collection of extractors for a single field with rules about which one(s) to prefer.
  2. Image processing and OCR. Technically these are two different segments of the platform, but they can be interdependent. Being able to clean up a document before OCRing it and having multiple OCR options lets us get the best results for each document.

What needs improvement?

If Grooper could "sense" important fields on the document and auto-build extractors for them, that'd be really cool. They do have an "Infer Grid" method for tables, but it only works with specific types of tables.

For how long have I used the solution?

We have been using Grooper for one and a half years.

What do I think about the stability of the solution?

The product is very stable and displays decent error codes.

What do I think about the scalability of the solution?

It is incredibly scalable. We build all of our models with a small data set and then run large ones through them. For example, models built using about 500 revenue statements have now processed thousands.

How are customer service and technical support?

The customer support has been excellent!  They have a wiki, a forum, email support, and phone support.

Which solution did I use previously and why did I switch?

We did not use another solution prior to this one.

How was the initial setup?

It was very straightforward to get installed, but using it does take training.

What about the implementation team?

We deployed in-house. It was that easy.

What's my experience with pricing, setup cost, and licensing?

Know how many pages you will be needing to process, as the pricing is based on that.

Which other solutions did I evaluate?

We did look at a few other options, namely Automation Anywhere, DocView, and ThoughtTrace.

What other advice do I have?

I'm not sure the UI could be improved, as any changes might make it harder to use or more overwhelming. Usually, the easier you make common tasks, the harder you make it to perform more advanced functions.

Overall, this solution is great!

Which deployment model are you using for this solution?

On-premises
Disclosure: My company has a business relationship with this vendor other than being a customer: My company has a special services agreement - but personally I am just a user.
reviewer1508328
Data Scientist at Intellese
Real User
Top 20
Efficient PDF extraction capabilities, and the lexicons for data input are helpful

Pros and Cons

  • "Lexicons where the key vocabulary can be inputted it is very helpful."
  • "They should have more sub-extractors or exclusion extractors so that the user does not have to make a parent data type."

What is our primary use case?

I use Grooper to extract PDF documents.

I built different content models for different PDF files. After one model was built, it had to be altered based on the other files that have the same format.

How has it helped my organization?

It helps our customers solve many problems. It also provides a new gateway to obtain data.

The files can be very in age and as long as a file is readable, it can be extracted into the database. Some of the files are organized very poorly but as long as Grooper's rule is set up properly, it can extract the values.

What is most valuable?

The whole idea of Grooper, where the selected physical files can be extracted into a database and be analyzed. In the world of data science, data is the most important part of the problem and without it, there is nothing to work with.

With Grooper, the old archive of information can be obtained with the models we built.

Lexicons where the key vocabulary can be inputted it is very helpful.

The table extractors are very efficient with the three main methods. The transpose method can be used as well.

What needs improvement?

When editing the extractors, the name should be shown.

They should have more sub-extractors or exclusion extractors so that the user does not have to make a parent data type.

I would like to see them resolve the remark of the positive extractors and negative extractors. If the extractor provided here successfully extracts one or more values from the document, the document will be classified as this Document Type with no further processing.

There are few bugs that my superior has posted in the Grooper exchange that concerned errors with the OCR. Even if we recognized them again at the page level, they turned out to be correct but the document level was still wrong.

There are different degrees of OCR issues. A smaller issue might be that some characters were missed, whereas a bigger problem might be that the values are missed. This means that the fuzzy mode translation function needed to be used.

The biggest issue might be when characters are hard to read, even by humans. In this case, so far there is nothing much can be done.

On one file that we have, it has a table with the same four headers shown many times. In this case, there is nothing Grooper can do, at least as far as we can tell based on the knowledge and skills we have learned. 

For how long have I used the solution?

We have been using Grooper for the past four months.

How are customer service and technical support?

Customer support can solve most general issues, but for some very specific problems, you might need to get to higher-level technicians. 

Which solution did I use previously and why did I switch?

Grooper is our main solution engine and we did not use another one prior to this.

How was the initial setup?

The setup is very straightforward, but the learning quite complex.

If the model is too big, it is hard to keep track of the building process. When it takes too long, someone else may be trying to edit it. 

What about the implementation team?

We deployed with our in-house team.

Which deployment model are you using for this solution?

Public Cloud
Disclosure: I am a real user, and this review is based on my own experience and opinions.