Still extracts barcodes and stuff, not just text.
That's the possible thing I see left to fix.
Signed-off-by: Ethan Wellenreiter <ewellenreiter@gmail.com>
In between testing out using a line morphology.
Can't seem to dial it in for the subclusters.
Want to try combining the old and new technique. That is,
the line morphology for the first full receipt and then the letter
technique for the subclustering.
Signed-off-by: Ethan Wellenreiter <ewellenreiter@gmail.com>
It does not work if there are small dots near the edge.
For example, if a small bit of a character is detached or
a colon is really low/high.
Signed-off-by: Ethan Wellenreiter <ewellenreiter@gmail.com>
Tried with kernel based off of character size but
it didn't work. Also removed old adaptive kernel
which was based off of image size.
Signed-off-by: Ethan Wellenreiter <ewellenreiter@gmail.com>
Need to update text clarifier so that lines aren't
merged together on characters but have updated it so that
there is a deskew in between line clustering.
Signed-off-by: Ethan Wellenreiter <ewellenreiter@gmail.com>
Tried easyOCR but it was pretty bad so I'm going to try
the pytorch based model TrOCR which uses the MIT Licence.
6f60612e7c/LICENSE
Signed-off-by: Ethan Wellenreiter <ewellenreiter@gmail.com>
Updated refined images and also added images of the extracted lines
using the new autocropper and line extraction functions respectively.
Signed-off-by: Ethan Wellenreiter <ewellenreiter@gmail.com>
Generally extracts the lines well. There might be some errors
in the future so it needs to be checked.
Signed-off-by: Ethan Wellenreiter <ewellenreiter@gmail.com>