This is an automated archive made by the Lemmit Bot.

The original was posted on /r/opensource by /u/Fluffy-4477 on 2024-09-24 19:50:46+00:00.


Examples of the before and after states of the organized folder after running Connor, along with the corresponding tree structure of the folder displayed in the GUI version of the app (It is displayed in the CLI as well).

Connor works locally on your computer. The process begins by reading filenames and their contents from the selected folder, the cosine similarity score (between -1 and 1) between each file’s content is calculated to identify similar files based on a specified threshold. Files that are above the threshold are grouped as key-value pairs into a dictionary, where each key corresponds to a category or folder of similar files. Latent Dirichlet Allocation (LDA) is then used to generate topic names for the contents in each category, resulting in the creation of corresponding folders names. These folders are appropriately named and the respective files are subsequently organized into them.

The tool uses a pre-trained NLP model sentence-transformers/paraphrase-MiniLM-L6-v2 to understand the meaning of the data and calculate the cosine similarity.

Links: Repo and PyPI

End-user (you) can:

  1. Organize files within a selected folder or manually uploaded files (uploading files is only supported for GUI).
  2. Organize text-based files (.docx, .txt, .pdf, etc.) using NLP based on their content.

Customization Options:

  1. Similarity Threshold: Allows you to choose a similarity percentage threshold for grouping similar files.
  2. Reading Word Limit: You can set a limit on the number of words to read from the file content.
  3. Folder Name Word Limit: You can specify a maximum number of words allowed in the created folder names.

The GUI doesn’t fail gracefully so it needs some updates that will come soon otherwise its perfectly functional but the CLI is completely finished and its efficiency will be improved as I work on it. I would appreciate if you guys have any feedback. Thanks so much for reading!