A Picture is Worth a Thousand Words - and Heaps More! The Past, Present and Future of OCR Technology

The global OCR market is expected to be valued at US$ 51,527.0 million by 2030, and “to expand at a CAGR of 15.2% during the forecast period from 2020 to 2030,” according to a report published by Transparency Market Research, in April 2020.

Optical Character Recognition is a popular and revolutionary technique used to recognize text from images and convert it into machine-readable text data. The traditional method of keeping track of data by using hand-held, printed or handwritten documents, comes with its fair set of challenges/drawbacks such as the need to manually sift through pages to derive required information, difficultly in allocating storage space, and the amount of resources (paper, ink) used in the process. With OCR, documents, across languages, can be digitized, and saved on the cloud, thereby, enabling easy access. Furthermore, the digitized documents can now be used to derive insights and make existing analyses richer.

Through this article, we will take a look at how the use of OCR has evolved over the years, and exciting future use cases around the corner.

A Brief Timeline Of OCR Technology

1910s – Optophone was built - an electronic device that scans the printed alphabet, and presents sound combinations enabling the visually-impaired to read.

1950s – Gismo was developed - a device that could recognize the 26 letters of the alphabet used in a standard typewriter.

1970s – Flatbed scanners were built to help digitize printed text – this device could only read certain fonts designed for machine readability, during its initial inception.

1990s – Newton Message Pads were enabled with handwriting recognition - while it was not a commercial success, it was a pivotal turning point for OCR technologies.

2000s – Google took up the OCR software, Tesseract, and in a few years, it was mapped to neural networks to create, arguably, the most popular OCR engine.

Use Cases of OCR technology

Digitization, automation of data: With OCR, manual data entries can be digitized and automated. Automating the process reduces the number of human errors, while digitization increases ease of data access, improves data security, improves cost efficiency, saves space, and is also more environment-friendly because it cuts down on paper usage, use of ink, etc.

Document verification: This technology can be used to verify important documents such as passports, VISAs, and other proof of identity.

Automated vehicle number plate recognition: By using images captured with CCTV cameras, miscreants can be reprimanded, and traffic rules can be enforced efficiently.

Assist visually-impaired people: OCR technology can read printed text aloud and thereby assist the visually-impaired.

Language translation: OCR can help translate content from one language to another – a technique that would prove useful while travelling/meeting people from around the world.

Evidence Management: Managing and sifting through copious amounts of evidence is a pivotal part of investigative process for defence/police authorities. Setting up OCR technologies can help to quickly access relevant evidence and create image-word associations, improving searchability of evidence.

Managing Auditing: Bank and credit card statements can be analyzed with near perfect accuracy using OCR technology and help in detecting fraudulent transactions, anomalies, or aberrations by sifting through digital databases – significantly faster than having to physically go through piles of paper. This will also help bring down auditory expenditure.

Live Casinos: OCR technologies form a pivotal part of online casinos and help recreate the climate of brick and mortar casinos, and collect pertinent data on the game, from the way the die is rolled to players’ performance, etc.

EdTech: In the wake of the COVID-19 pandemic, as many students are studying from home, OCR technology is being leveraged by EdTech firms to help guide students as they study online. An APAC-based online tutoring platform, for instance, is empowering the “doubt solving segment of online learning,” by utilising OCR technology and AI to extract text from the photos shared by students, and matching it with the accurate answer from its database.

Framework for OCR use cases

Every image use case is different, and a single code would not suffice to decode it. However, adding relevant parameters when coding and tweaking them as and when needed, can help extract text across different types and forms. Here are three broad, yet simple steps that OCR typically entails:

What does the future hold for OCR?

While OCR technology has evolved over the years, even to this day, “extracting text from an image,” is easier said than done. Absence of metrics is one of the primary challenges to leveraging OCR effectively. Much like an NLP exercise, samples need to be visually scoured through to determine satisfactory outcomes, and words and numbers need to be manually checked in few images to gauge the accuracy. However, building a deep neural net with plenty of samples can help train the models on factors such as font style and document layouts. Converting handwritten text to digital text is another instance that is yet to be perfected and still in exploratory stages – while numbers are accurately determined in most cases, alphabets are relatively more challenging, given the different font and writing styles. Leveraging neural networks and Capsule Networks can help mitigate the challenges and increase ease of working with handwritten texts. In fact, with deep learning, text can reportedly be recognised with approximately 99.73% accuracy.

A report published by Transparency Market Research noted that North America was expected to be a major player in the OCR market given the change in government policies and regulations, as well as infrastructure development, while the APAC region is expected to have the highest CAGR in the forecast period (likely driven by small & medium-sized enterprises, adoption of tech by the IT & Telecom industry for document management), and the overall growth of the market is expected to be fuelled by the increasing demand for software in the Middle East & Africa.

Furthermore, use cases are also growing far and beyond the traditional analog to digital text conversion. For instance, OCR is being utilised in researching historical texts. The University of Connecticut, for instance, shared that “The UConn Library and the School of Engineering are working to develop new technology that applies machine learning to handwriting text recognition that will allow researchers to have improved access to handwritten historic documents.” Machine learning is being applied to handwriting text recognition, and the characters identified are used to create algorithms that help to recognize patterns, and leverages them to form neural networks and systematically learns from them, akin to the processes of a human brain.

Many business leaders are also shifting their focus beyond machine learning to deep learning, because “Driven by deep learning, [OCR] is entering a new phase where it first recognizes scanned text, then makes meaning of it. The competitive edge will be given to the software that provides the most powerful information extraction and highest-quality insights.”

OCR technology has evolved over the years, albeit intermittently, and its usage is only expected to grow by leaps and bounds in the future. By leveraging the needed AI tools and techniques, we can move beyond the traditional, intended use cases. And with AI-powered OCR, pictures will not just speak a thousand words, but also be able to derive thousands of meaningful insights, devoid of human error.

Analyst | TheMathCompany

Tanvi PS