by Al Massey
When it works, Optical Character Recognition (OCR) software is one of the greatest tools ever devised. However, most OCR software either refuses to work right or is somewhat lazy in its approach to the job.
Whenever everything works as planned, OCR software combined with the PC can make us all more productive by allowing us to manipulate, modify and reuse existing information easily without having to reenter it over and over again. An intuitive OCR package can transform virtually any printed document into a file that we can save, edit or reuse any time. It makes it possible to scan in a document, edit or add to it, and then email or save in a word processor or publish to the World Wide Web in a seamless fa shion.
Just about every scanner on the market comes with some kind of OCR package and that is one of the problems. Most of these programs are either lite versions or previous generation packages that are poorly documented and leave a lot to the imagination. A s a result, a lot of people that can truly benefit from OCR get frustrated and refuse to explore its advantages adequately.
In an effort to help reduce the frustration level, I looked at ten products and spent countless hours evaluating their features, in order to cull them down to three that I feel are worthy of your attention. Presto! OCR Pro 3.0, TextBridge Pro 98, and Omn iPage Pro 8.0.
Whichever package you choose, be sure to select one that supports TWAIN (Technology Without An Interesting Name). Simply put, TWAIN is a programming interface that lets a graphics application, such as an image editing program or desktop publishing progra m, activate a scanner, frame grabber or other image-capturing device. Most scanners and OCR programs are TWAIN compliant and this shouldnt be a problem, but it never hurts to ask.
You will find that most of the OCR programs can be upgraded from a competitive product or previous version for a great deal less than the shipping version found on the retail shelves. I have tried to select packages with the right combination of speed, a ccuracy, price and the ability to produce an image as close to the original as possible.
At its most basic, OCR takes a scanned image and converts or translates the enclosed text into a file that can be edited. Most programs will also attempt to duplicate the fonts and layout such as columns, headers, footers and graphics as well. In order t o perform these tasks, OCR will divide the scanned image into zones such as text, graphics and tables, then sort the zones into a logical order so that text flows correctly from one column to the next, for example.
Of course, this is the theory behind the program, and as in all theory the results are not always perfect or pretty.
In testing these programs I used a baseline of 95% for accuracy. If a program produced less than 95% accuracy I threw it out and went on to something else.
At first blush 95% seems good, but dont be deceived, a small difference in accuracy can translate into a big a loss in productivity.
Presto! OCR Pro 3.0
Presto! OCR Pro 3.0 from NewSoft Inc. is a new arrival to the U.S. market. This package is based on something called FineReader technology that was developed by a Russian company, Bit Software (now ABBYY Software House). NewSoft stepped into the picture to help co-develop it for the American market and I must say I have been favorably impressed with the results. It represents high word accuracy and gives excellent reproductions in a sophisticated package that belies its low price.
I found OCR Pro 3.0s interface, consisting of three detachable toolbars and windows for the scanned image and OCR results, to be a pleasure to work with. However, the program only allows you to display one page at a time and that proved to be a bit of a hindrance.
Additional pages are appended to the recognized text. In order to process documents with more than one page you have to use batch mode. There is a Recognition toolbar with buttons for scanning, zoning and recognition as well as a Scan and Read button tha t scans and performs OCR in one step. This is a pretty neat feature.
When you import a document, OCR Pro uses zoning to analyze the page layout. It then designates text, picture and table blocks. I found its ability to pick up page layout of simple text files and magazine articles with graphics worked well in test, but it faltered a bit on more complex documents. It also performed well in the Send To area, letting me export edited documents to Microsoft Word, Excel and other programs easily.
Presto! OCR Pro gives a whole different meaning to batch processing. It is designed to support multiple users on a network, so you can place a job on the network and all users can access the batch and work on individual pages simultaneously.
OCR Pro ranked near the top when it came to word accuracy with over 98%, right behind TextBridge Pro, and it did a superior job of maintaining formatting. It also proved to be almost as good at performing OCR as OmniPage and TextBridge Pro.
About the only complaint I had was the programs shortcomings in the areas of online help and documentation. It sometimes left too much to the imagination and that resulted in a longer learning curve than I would have liked.
Presto! OCR Pro 3.0, by NewSoft, Inc. and available for $99 street price (trade up from a competitive product for $49.95) is a TWAIN-compatible scanner or fax-modem for Windows 95. For more info call 800-436-4365 or go to www.newsoftinc.com.
TextBridge Pro 98
If I had to pick a winner in the OCR category, then TextBridge Pro 98 from the ScanSoft Division of Xerox would be the one. Even though TextBridge lacks the feature set of OmniPage Pro, it was consistent when it came to performance. Every time I put it on the line, for speed, word accuracy and preservation of attributes, it was the clear winner. The Tara Lipinski of OCR software.
For the novice I highly recommend TextBridge Pros simple interface, consisting of a viewing window and three large buttons for acquiring an image, performing OCR, or performing both functions automatically. In addition, the interface also has three butt ons for more common tasks such as opening files, saving, and editing. All this, coupled with a built-in proofreader (that bears a striking resemblance to a common spell checker) and a training mode makes for a well-rounded package.
For the power user, TextBridge Pro came up short in features such as batch processing and the ability to reorder zones. I also found it a bit on the hazy side when it came time to set up my scanner initially. I tried this product with four different scan ners, and on more than one occasion it took a bit of trial and error before I could get a scanner to work with TextBridge. Another area of concern revolves around multi-threading. The more computer resources I obtain, the more I rely on multi-threading. I simply want my software tools to do more than one thing at a time. Time is short and I dont want to waste any of it. OmniPage Pro was the only product tested that allowed multi-threading. The pluses outweighed the minuses, however, when it came to wor d accuracy, where TextBridge weighed in at over 99%. In the end, word accuracy, speed on a par with OmniPage Pro and the ability to retain formatting made this my top pick. TextBridge Pro 98 was the only program I tested that continually recognized heade rs and footers and inverted text.
TextBridge Pro 98 by ScanSoft, Inc., a division of Xerox, Inc. is available for about $80 street price for Windows 95/NT4. For more info call 800-432-9329 or go to www.textbridge.com.
OmniPage Pro 8.0
OmniPage and TextBridge are the two most common OCR packages bundled with scanners, so chances are strong you are already familiar with this product. If so, and you have been asking yourself if you should upgrade to the latest full-blown product, the ans wer is a resounding yes.
Even though I found OmniPage fell a bit short when it comes to word accuracy, its rich set of features along with a clean, intuitive interface make it a strong contender for top spot in the OCR category. It was a consistent and solid performer throughout the evaluation process.
OmniPage Pros main screen is divided into three adjustable vertical panes that display thumbnails of scanned images, OCR results and a preview of each page. There is an AutoOCR toolbar, a set of buttons, and drop down menus for scanning documents, zonin g and OCR as well as exporting the results. OmniPage Pro makes use of Wizard technology that steps you through each stage, a feature that I wish the competition would adopt.
A Zone toolbar provides access to tools for joining, splitting and reordering zones. This feature also gives you the ability to define zones as single-column, table, graphic, or mixed. I found that this feature did not always work as advertised, but stil l the concept is nice and I hope they will work out the kinks in a future release.
OmniPage performed better than any of the other products when it came to batch processing. The program lets you schedule automated OCR jobs that can include both scanning and OCR. Another feature that proved of particular interest was the ability to set OmniPage to monitor a specific directory for documents to be processed. I also found the presence of multi-threading support a definite plus, because it allows you to proofread one document while the program is performing OCR on another. This greatly spe eds up the entire process.
While OmniPage came up short on word accuracy, it proved a solid performer in its ability to preserve formatting, and it translates documents into usable formats. It was also the speed demon of the pack. In the end I would highly recommend this product f or the power user who doesnt mind a fairly steep learning curve. OmniPage Pro 8.0 by Caere Corp. is available for about $499 (upgrade from competitive product $139) for Windows 95/NT. For more info call 1-800-535-7226 ext. 115 or go to www.caere.com.
Al Massey is a HAL-PC member who can be contacted at almas@hal-pc.org.
E-mail me at webmaster@hal-pc.org with any comments you have and tell me what you want to see here.