This is an automated archive made by the Lemmit Bot.

The original was posted on /r/opensource by /u/qooooob on 2023-07-05 08:46:12+00:00.


I’m currently exploring options on how to extract text from PDFs - both with and without OCR.

I’ve noticed that most PDF extraction (and especially OCR) libraries have permissive licenses, but dependencies to other software which have copy-left licenses like AGPL. A good example is ocrmypdf (MIT) which relies on ghostscript v9.5 (AGPL) to work. Since ocrmypdf needs ghostscript, anyone using ocrmypdf needs to abide by the rules of ghostscript as well. From what I’ve read anyone is free to use this software including companies - however one cannot integrate this software into their own without releasing it under the same license.

In the case of extracting data, the user owns the input data and the output, but got to the output only thanks to this software they did not write. From what I understand this could be considered at least from these two points of view:

  1. Company uses copy-left licensed software to convert their paper-archive into a paperless archive. Company can then use that information to improve their efficiency or create a database, and sell access to that information. As far as I understand it, this is OK even if the company is making a profit through usage of copy-left software
  2. Company integrates copy-left licensed software into their own solution: e.g. offering users the opportunity to upload their files and receive the output. This is not OK, because the company does not release the code of the full service this app is a part of.

Am I understanding all this correctly? So in the case of software that takes an input and transforms it to an output, using starts and ends in the transformation as long as the user is the company (in the above example). What happens before or after is fair game. If the transformation is part of a larger service suite, then it extends to operations before and after it as long as they happen in the same service - now the user is not the company but some other person which changes the situation.