Recognition of handwritten letters using neural networks. A quick way to recognize text

and with probability 0.1 – to class C 2. The stated problem can be solved using an SME with N inputs and M outputs, trained to produce a vector at the output c, when the input is given p.

During the learning process, the network builds a mapping P → C. It is not possible to obtain this mapping in its entirety, but it is possible to obtain an arbitrary number of pairs ( p → c), connected by display. For an arbitrary vector p at the input we can obtain approximate probabilities of class membership at the output.

It often turns out that the components of the output vector can be less than 0 or greater than 1, and the second condition (1) is only approximately satisfied. Inaccuracy is a consequence of the analogue nature of neural networks. Most of the results obtained using neural networks are inaccurate. In addition, when training a network, the specified conditions imposed on the probabilities are not directly introduced into the network, but are implicitly contained in the set of data on which the network is trained. This is the second reason for the incorrectness of the result.

There are other ways of formalization.

We will represent letters in the form of dot images (Fig.).

Rice. . Spot image.

The dark pixel cell in the image corresponds to I ij = 1, light - I ij = 0 . The task is to determine from the image the letter that was presented.

Let's build an SME with N i X N j inputs, where each input corresponds to one pixel: x k = I ij . The pixel brightnesses will be components of the input vector.

As output signals, we choose the probabilities that the presented image corresponds to a given letter:

The network calculates the output:

where is the exit c 1 = 0.9 means, for example, that an image of the letter “A” is presented, and the network is 90% sure of this, output c 2 = 0.1 - that the image corresponded to the letter “B” with a probability of 10%, etc.

There is another way: the network inputs are selected in the same way, and the output is only one, number m presented letter. The network learns to give meaning m according to the presented image I:

(I ij) → m

In this case, the disadvantage is that letters with similar numbers m, but dissimilar images, may be confused by the network during recognition.

The problem of image recognition is one of the most widespread problems successfully solved using ANN. A variety of formulations of the problem are possible here, one of the most simple options– recognition of a fixed set of characters.

Example 3.11. Letter recognition. The MatLab system provides a special function

>> = prprob;

This function returns two binary matrices: in the alphabet matrix (size 35x26), each column encodes one letter, and the targets matrix (size 26x26) is diagonal and serves to identify the column.

Each column of alphabet corresponds to a 7x5 matrix, which is a binary image of the letter.

The following function displays all alphabet columns as letters (the function must be placed in the MatLab working directory):

function plotletters(alphabet)

fprintf("plotletters is plotting the first 25 letters\n");

Size(alphabet);

error("plotletters needs columns 35 numbers long");

MM=colormap(gray);

MM=MM(end:-1:1,:);

imagesc(reshape(alphabet(:,j),5,7)");

The result of executing the function is shown in Fig. 3.12:

>> plotletters(alphabet);

Figure 3.12.Binary alphabet coding.

Based on the structure of the targets matrix, the neural network should have 26 output neurons. Let's set the number of neurons in the hidden layer to 10.

>> net = newff(minmax(alphabet),,("logsig" "logsig"),"traingdx");

>> P = alphabet; T = targets;

Let's set the number of epochs and start the learning process:

>> net.trainParam.epochs = 1000;

>> = train(net,P,T);

The learning curve is shown in Fig. 3.13.

Figure 3.13.Changing the error during the training process

To check the quality of the trained network, consider noisy images of letters (Fig. 3.14):

>> noisyP = alphabet+randn(size(alphabet)) * 0.2;

>> plotletters(noisyP);

The following command runs the network on a noisy input set:

>> A2 = sim(net,noisyP);

Matrix A2 here contains various numbers in the range . Using the function compet You can select the maximum element in each column, then assign the value 1 to it, and reset the remaining elements of the column to zero:

Figure 3.14.Images of letters in the presence of noise

>>for j=1:26

A3 = compet(A2(:,j));

answer(j) = find(compet(A3) == 1);

You can then visually evaluate the network's responses to noisy input vectors using the commands:

>> NetLetters=alphabet(:,answer);

>> plotletters(NetLetters);

In Fig. Figure 3.15 shows the final recognition result.

Figure 3.15.Result of neural network recognition

Obviously, some letters have been identified incorrectly. This may be either a consequence of poor training of the network, or too high a noise level, or an incorrect choice of the number of neurons in the inner layer.

Letter recognition exercise. Various difficulty levels. A mask with noise is applied to the letter. Sometimes you need to be quick-witted in order to understand by elimination what kind of letter was in the task.

Teaching children to read and letters of the Russian alphabet. What letter is shown? Choose the correct answer on the right.

Which letter is hidden? Online game for early development children. Recognition of letters of the Russian alphabet

How to learn letters of the Russian alphabet

Often the letters of the Russian alphabet begin to be taught in order, as they are written in the primer. In fact, letters should be taught in order of frequency of use. I'll give you a little hint - letters in the center of the keyboard are used more often than those on the periphery. Therefore, first you need to memorize A, P, R, O.... and leave such ones as Y, X, F, Shch for a snack...

What is better - teaching a child to read letters or syllables?

Many teachers teach immediately in syllables. I suggest you get around this small problem and play online games instead of learning syllables. This is how the child learns and plays at the same time. Or rather, it seems to him that he is playing and at the same time involuntarily repeating the necessary sounds.

The advantage of online games is that if you pronounce a letter incorrectly, the simulator will patiently repeat the correct answer until you remember.

Do ABC books help you learn letters? Why paper primers are still used in teaching practice

Traditionally, paper ABC books are used to teach letters. Their advantages are undeniable. If you drop the paper version on the floor, you don’t have to worry about the device breaking. Primers can be opened on a specific page and placed in a visible place. All this is not found in electronic devices.

However, programmable reading training simulators also have certain advantages, for example, they can speak, unlike their paper counterparts. Therefore, we can recommend both paper and electronic sources.

Do online exercises help you remember letters?

The main emphasis when using electronic and online games is that a person involuntarily repeats the same information many times. The more often the repetition occurs, the more firmly the information is introduced into the consciousness and brain. Therefore, online exercises are a very useful addition to traditional cubes and paper books.

At what age should a child be sent to educational centers?

The speed of maturation is different. Usually. Girls up to a certain age are ahead of boys in development. Girls begin to speak earlier, they are more socially oriented and more amenable to learning. boys, on the contrary, are often very autistic - who walk on their own. From this we can conclude that girls learn to read a little earlier than boys. But this is only an external diagram. Each child is individual and his readiness for learning can be tested in practice. Does your child enjoy attending classes? does anything remain in his mind after he has unlearned it?

Maybe try to study on your own, especially since riding the bus takes time, and no one understands your baby better than mom and dad.

What to do if your child does not remember letters

Studying is difficult. And it does not depend on whether it is an adult or a child. It is very, very difficult to learn. In addition, children learn only through play. Another fact is that in order to learn something, it must be practiced or repeated many times. Therefore, it is not surprising that children remember letters very poorly.

There is a separate group of children who begin to speak late and at the same time confuse not only letters, but also sounds. with such guys you need to draw letters together, use all possible materials for this, cereals, matches, pebbles, pencils - everything that is at hand. Draw it and ask your child to repeat it.

Can be done graphic dictations, you can play draw and repeat.

What to do if your baby confuses letters, for example, D and T

If a child confuses letters, this means it is too early to move on to reading words. Go back and repeat the letters. Children often confuse voiced and unvoiced letters or similar spellings, for example, P and R. Repetition practice can help. For example, you can sculpt letters together, you can make letters from the body, for example, by placing your arms to the sides to depict the letter T.

How to teach a child to memorize letters if he doesn’t want to

repetition is the mother of learning. Repeat letters in words, repeat letters in syllables, try to guess the letters. Let the child write the letter and you try to guess. Or, you can do the opposite - try to form a letter from grains of rice, and your son or daughter will guess what kind of letter it is. You can write in the sand with a stick.

Why can't he pronounce the letters correctly? How to teach a child to pronounce letters clearly and clearly?

Gaps may be at the physiological level. The person does not hear himself correctly. or it seems to him that he is speaking correctly. It’s very easy to check this - just record the conversation on a voice recorder and listen to the child read.

It could also be a simple lack of training. To different people It takes a different number of times to repeat information before it is remembered, and a child is no exception. It needs to be repeated many times and in different situations before he begins to pronounce letters and sounds correctly.

What also needs to be noted is that you need to love children and work with them periodically. Do not start processes.

How to teach your child the alphabet to prepare for school

You need to work with children in game form. Exactly as stated on this site. Another secret of training is that you need to study in small portions. Children cannot maintain attention for more than 5 minutes. Therefore, it is simply useless to study longer.

What letters should you start memorizing the alphabet with?

You need to start memorizing letters with commonly used letters. The second secret is to remember the letters that make up the child’s name, the name of mom and dad, you can add to these words the names of brother and sister, grandparents. These are my favorite names.

By the way, if you are learning to touch type, then the first word with which you need to start typing training is again your first and last name.

Does your baby need to memorize the letters of the English alphabet?

Knowing the English alphabet won't hurt. They don’t study the alphabet at school, but immediately start reading, leaving the alphabet up to the parents. It is also worth noting that large and small English letters look different and must be remembered. If your child started speaking late, then most likely, remembering Latin letters will be a problem for him.

Is it possible to teach a child to read immediately in words?

Written Russian looks the same as spoken Russian, unlike English or French, so remember the words

How to remember numbers for a preschooler

Draw numbers, count sticks, when you walk, count red and white cars, count whether there are more men or women walking down the street. Turn everything into games.

Try to read the text letter by letter yourself - not only will it take a long time, but it will also be unlike the way we actually speak. Adults do not spell - unless the word is unfamiliar or in a foreign language. Then, in order to hear it, they read it slowly and carefully pronounce the words.

Why does a preschooler forget letters? Teaching reading through games

Why does a baby forget letters even though he learned them yesterday?

Usually, a child easily remembers some letters, but not so much others. The role of an adult is to note what his ward does not succeed and give additional tasks.

Another important thing is regularity. Since for a child all learning is, frankly speaking, cramming and repetition, the learning process should be such that information is repeated at certain intervals.

Ebbinghaus (read more about this on Wikipedia) studied how quickly information that is meaningless to a person is forgotten and came to the conclusion that 40% of the information is forgotten in the first twenty minutes. And, if it is impossible to say exactly what a particular letter means, then this is tantamount to the fact that the letter is completely unfamiliar. There must be an unambiguous 100% recognition.

Repeat, repeat, repeat

For example, you train warehouses (syllable, combination of letters) ON, and the child more or less learned to recognize and read the combination. Add the syllable BUT to the tasks, and ask them to read the words, helping them to read letters that are still unfamiliar to the child. However, the child can click on the syllables himself and listen to the computer read.

This project does not claim to be the first place in the world and is not considered as a competitor FineReader, but I hope that the idea of character pattern recognition using the Euler characteristic will be new.

Introduction to the Euler characteristic of an image.

The basic idea is that you take a black and white image, and assuming that 0 is a white pixel and 1 is a black pixel, then the entire image will be a matrix of zeros and ones. In this case, a black and white image can be represented as a set of fragments measuring 2 by 2 pixels; all possible combinations are presented in the figure:

On each image pic1, pic2,... shows a red square of the counting step in the algorithm, inside which is one of the fragments F from the picture above. At each step, each fragment is summed, resulting in an image Original we obtain the set: , henceforth it will be called the Euler characteristic of the image or the characteristic set.

COMMENT: in practice, the F0 value (for the Original image this value is 8) is not used, since it is the background of the image. Therefore, 15 values will be used, starting from F1 to F15.

Properties of the Euler characteristic of an image.

The value of the characteristic set is unique, in other words, there are no two images with the same Euler characteristic.
There is no algorithm for converting from a characteristic set to the original image; the only way is brute force.

What is the text recognition algorithm?

The idea of letter recognition is that we pre-calculate the Euler characteristic for all characters in the alphabet of the language and store this in the knowledge base. Then we will calculate the Euler characteristic for parts of the recognized image and search for it in the knowledge base.

Recognition stages:

The image can be either black and white or color, so the first stage is approximation of the image, that is, obtaining black and white from it.
We make a pixel-by-pixel pass through the entire image in order to find black pixels. When a shaded pixel is detected, a recursive operation is launched to search for all shaded pixels adjacent to the one found and subsequent ones. As a result, we will receive a fragment of the image, which can be either a whole character or a part of it, or “garbage” that should be discarded.
After finding all the unconnected parts of the image, the Euler characteristic is calculated for each.
Next, the analyzer comes into operation and, by going through each fragment, determines whether the value of its Euler characteristic is in the knowledge base. If we find the value, we consider that it is a recognized fragment of the image, otherwise we leave it for further study.
Unrecognized parts of the image are subject to heuristic analysis, that is, I try to find the most suitable value in the knowledge base based on the value of the Euler characteristic. If it was not possible to find, then an attempt is made to “glue together” nearby fragments and search for a result in the knowledge base for them. What is "gluing" done for? The fact is that not all letters consist of one continuous image, for example "!" The exclamation mark contains 2 segments (a stick and a dot), so before looking for it in the knowledge base, you need to calculate the total value of the Euler characteristic from both parts. If, even after gluing with adjacent segments, an acceptable result could not be found, then we consider the fragment as garbage and skip it.

System composition:

Knowledge base- a file or files originally created by me or someone else, containing characteristic character sets and required for recognition.
Core- contains basic functions that perform recognition
Generator- module for creating a knowledge base.

ClearType and anti-aliasing.

So, as input we have a recognizable image, and the goal is to make it black and white, suitable for starting the recognition process. It would seem that what could be simpler, we count all white pixels as 0, and all the rest as 1, but not everything is so simple. Text on an image can be anti-aliased or non-anti-aliased. Anti-aliased characters look smooth and without corners, while non-smoothed ones will look on modern monitors with pixels visible along the outline. With the advent of LCD (liquid crystal) screens, ClearType (for Windows) and other types of anti-aliasing were created, which took advantage of the features of the monitor matrix. The pixels of the text image change colors, after which it looks much “softer”. To see the result of smoothing, you can type some letter (or text), for example in mspaint, zoom in, and your text has turned into some kind of multi-colored mosaic.

What's the matter? Why do we see an ordinary symbol on a small scale? Are our eyes deceiving us? The fact is that the pixel of an LCD monitor does not consist of a single pixel that can receive the desired color, but of 3 subpixels of 3 colors, which are enough to obtain desired color. Therefore, the goal of ClearType is to obtain the most visually pleasing text using the feature of the LCD monitor matrix, and this is achieved using subpixel rendering. Anyone who has a “Magnifying Glass” can, for the purpose of experiment, enlarge any place on the screen that is turned on and see the matrix as in the picture below.

The figure shows a square of 3x3 pixels of the LCD matrix.

Attention! This feature complicates obtaining a black and white image and greatly affects the result, since it does not always make it possible to obtain the same image, the Euler characteristic of which is saved in the knowledge base. Thus, the difference in images forces a heuristic analysis, which may not always be successful.

Obtaining a black and white image.

I was not satisfied with the quality of the color to black and white conversion algorithms found on the Internet. After their application, the images of characters subjected to sublepixel rendering became different in width, breaks in letter lines and incomprehensible garbage appeared. As a result, I decided to obtain black and white images by analyzing the brightness of the pixel. All pixels brighter (greater than value) 130 units were considered black, the rest were white. This method is not ideal, and still leads to an unsatisfactory result if the brightness of the text changes, but at least it received images similar to the values in the knowledge base. The implementation can be seen in the LuminosityApproximator class.

Knowledge base.

The initial idea of filling the knowledge base was that for each letter of the language I would calculate the Euler characteristic of the resulting symbol image for 140 fonts that are installed on my computer (C:\Windows\Fonts), add all the options for font types (Regular, Fatty, Italics) and sizes from 8 to 32, thereby covering all, or almost all, variations of letters and the base will become universal, but unfortunately this turned out to be not as good as it seems. With these conditions, this is what I got:

The knowledge base file turned out to be quite large (about 3 megabytes) for Russian and in English. Despite the fact that the Euler characteristic is stored as a simple string of 15 digits, and the file itself is a compressed archive (DeflateStream), which is then unpacked in memory.
It takes me about 10 seconds to deserialize the knowledge base. At the same time, the time for comparing characteristic sets suffered. It was not possible to find a function for calculating GetHashCode(), so I had to compare bit by bit. And compared to a knowledge base of 3-5 fonts, the time for text analysis with a database of 140 fonts increased by 30-50 times. At the same time, the same characteristic sets are not saved in the knowledge base, despite the fact that some characters in different fonts may look the same and be similar, even there are, for example, 20 and 21 fonts.

Therefore, I had to create a small knowledge base that goes inside the Core module and makes it possible to check the functionality. There is a very serious problem when filling the database. Not all fonts display small characters correctly. Let's say the character "e" when rendered in size 8 font named "Franklin Gothic Medium" turns out to be:

And it bears little resemblance to the original. Moreover, if you add it to the knowledge base, then this will greatly worsen the results of the heuristic, since the analysis of symbols similar to this one is misleading. D This symbol was obtained in different fonts for different letters. The process of filling the knowledge base itself needs to be controlled so that each image of a symbol, before saving to the knowledge base, is checked by a person for compliance with the letter. But, unfortunately, I don’t have that much energy and time.

Character search algorithm.

I will say right away that initially I underestimated this problem with the search and forgot that symbols can consist of several parts. It seemed to me that during a pixel-by-pixel passage I would encounter a symbol, find its parts, if any, combine them and analyze them. A typical pass would look like this: I look up the letter "H" (In the knowledge base) and consider that all characters below the top dot and above the bottom dot belong to the current line and should be aliased together:

But this is an ideal situation; during recognition, I had to deal with torn images, which, in addition to everything, could have a huge amount of garbage located next to the text:

This image of the word "yes" will try to explain the complexity of the analysis. We will assume that this is a complete string, but b13 and i6 are fragments of garbage as a result of approximation. The character "y" is missing a period, and none of the characters are present in the knowledge base to say with certainty that we are dealing with a line of text from the "c" to the "i" line. And the line height is very important to us, since for gluing we need to know how close the fragments should be “glued together” and analyzed. After all, there may be a situation where we accidentally start gluing together characters from two strings and the results of such recognition will be far from ideal.

Heuristics in image analysis.

What are heuristics in image recognition? This is the process by which a characteristic set not present in the knowledge base is recognized as a correct letter of the alphabet. I thought for a long time about how to perform the analysis, and in the end the most successful algorithm turned out to be this:

I find all the characteristic sets in the knowledge base that have the largest number of values F fragments matches the recognized image.
Next, I select only those characteristic sets in which, with the recognizable image based on unequal F values of the fragment, the difference is no more than +- 1 unit: -1< F < 1. И это все подсчитывается для каждой буквы алфавита.
Then I find the symbol that has the largest number of occurrences. Considering it the result of a heuristic analysis.

This algorithm does not give the best results on small character images (font size 7 - 12) . But it may be due to the fact that the knowledge base contains characteristic sets for similar images of different symbols.

An example of use in C#.

An example of the beginning of image recognition image. The result variable will contain the text:

var recognizer = new TextRecognizer(container); var report = recognizer.Recognize(image); // Raw text. var result = report.RawText(); // List of all fragments and recognition state for each ones. var fragments = report.Symbols;

Demo project.

For a visual demonstration of the work, I wrote WPF application. It is launched from a project named " Qocr.Application.Wpf". An example of a window with the recognition result is below:

To recognize an image you will need:

Presses "New Image" selects an image for recognition
Using the " Black and White"You can see which image will be analyzed. If you see an extremely low-quality image, then do not expect good results. To improve the results, you can try to write a color image to black and white converter yourself.
Choosing a language "Language".
Clicks recognize "Recognize".

All image fragments should be marked with an orange or green frame.
An example of English-language text recognition:

Books by Michael Nilsson "Neural Networks and Deep Learning".

I divided the translation into several articles on Habré to make it easier to read:
Part 1) Introduction to Neural Networks
Part 2) Construction and gradient descent
Part 3) Implementation of a network for digit recognition
Part 4) A little about deep learning

Introduction

The human visual system is one of the most amazing in the world. In each hemisphere of our brain there is a visual cortex containing 140 million neurons with tens of billions of connections between them, but there is not one such cortex, there are several of them, and together they form a real supercomputer in our head, in the best possible way adapted during evolution to the perception of the visual component of our world. But the difficulty of recognizing visual patterns becomes obvious if you try to write a program to recognize, say, handwritten numbers.

The simple intuition - “the 9 has a loop at the top and a vertical tail at the bottom” is not so easy to implement algorithmically. Neural networks use examples, infer some rules, and learn from them. Moreover, the more examples we show the network, the more it learns about handwritten numbers, therefore classifying them with greater accuracy. We will write a program in 74 lines of code that will detect handwritten digits with >99% accuracy. So, let's go!

Perceptron

What is a neural network? To begin with, I will explain the model of an artificial neuron. The perceptron was developed in 1950 by Frank Rosenblatt, and today we will use one of his main models - the sigmoid perceptron. So how does it work? Persepron takes a vector as input and returns some output value.

Rosenblatt proposed a simple rule for calculating the output value. He introduced the concept of “significance”, then the “weight” of each input value. In our case, it will depend on whether it is greater or less than a certain threshold value.

And that's all we need! By varying the vector of weights , one can obtain absolutely different models decision making. Now let's return to the neural network.

So, we see that the network consists of several layers of neurons. The first layer is called the input layer or receptors (), the next layer is hidden (), and the last is the output layer (). The condition is quite cumbersome, let's replace it with the scalar product of vectors. Next, let's put , call it the perceptron offset or and move it to the left side. We get:

Learning problem

To see how learning might work, let's assume we slightly change some weight or bias in the network. We want this small change in weight to cause a small corresponding change in the output of the network. Schematically it looks like this:

If this were possible, then we could manipulate the weights in a direction favorable to us and gradually train the network, but the problem is that with some change in the weight of a particular neuron, its output can completely “flip” from 0 to 1. This can lead to to a large forecast error for the entire network, but there is a way around this problem.

Sigmoid neuron

We can overcome this problem by introducing a new type of artificial neuron called a sigmoid neuron. Sigmoid neurons are similar to perceptrons, but are modified so that small changes in their weights and bias cause only a small change in their output. The structure of a sigmoid neuron is similar, but now it can receive as input and output as output, where

It would seem that these are completely different cases, but I assure you that the perceptron and the sigmoid neuron have a lot in common. Let us assume that , then and therefore . The converse is also true, if , then and . Obviously, when working with a sigmoid neuron, we have a smoother perceptron. And indeed:

Neural Network Architecture

Designing the input and output layers of a neural network is quite simple. For example, let's say we're trying to determine whether a handwritten "9" is in an image or not. A natural way to design a network is to encode image pixel intensities into input neurons. If the image has size , then we have an input neuron. The output layer has one neuron, which contains the output value, if it is greater than 0.5, then there is “9” in the image, otherwise not. While designing input and output layers is a fairly simple task, choosing a hidden layer architecture is an art. Researchers have developed a variety of hidden layer design heuristics, such as ones that help compensate the number of hidden layers against network training time.

So far we have used neural networks in which the output from one layer is used as a signal for the next, such networks are called direct neural networks or feedforward networks (). However, there are other models of neural networks in which feedback loops are possible. These models are called recurrent neural networks (). Recurrent neural networks have been less influential than feedforward networks, in part because the training algorithms for recurrent networks are (at least to date) less efficient. But recurrent networks are still extremely interesting. They are much closer in spirit to how our brains work than feedforward networks. And it is possible that recurrent networks can solve important problems that can be solved with great difficulty by direct access networks.

So, that’s all for today, in the next article I will talk about gradient descent and training our future network. Thank you for your attention!