5 Notes
ewellenr edited this page 2023-10-24 09:48:56 -04:00

stb header file(s) are under MIT licence (https://github.com/nothings/stb).

OpenCV has a weird licence but it's pretty much free to use.

Idea(s) for text ML:

  1. determine what text is written and then determine what text is what
  2. Determine the text and use reinforcement learning/use a weird loss where you try and superimpose the text over the picture and do an L2 (or similar) type loss of the generated superimposed text vs the original text. might help because it's self-supervised but computationally hard because it has to position and scale and rotate and pick a font (have a list of selectable fonts). This really only determines the text though and doesn't help with figuring out what is what.
  3. Determine what text is what but also somehow encode the position in the picture to create a relatively linear convolutional neural network.
  4. Once rotation is done, use OCR to extract the text and then some sort of algorithm for classification of text

Dataset links:

  1. https://www.kaggle.com/datasets/nandinibagga/paper-images-dataset
  2. https://paperswithcode.com/dataset/funsd
  3. https://paperswithcode.com/dataset/rvl-cdip --->>> https://huggingface.co/datasets/aharley/rvl_cdip
  4. https://www.kaggle.com/datasets/jenswalter/receipts (MAYBE)
  5. https://github.com/clovaai/cord

Help: