utility to "inventory" and "catalog" disks, media, etc and the folders & files
Linux - DesktopThis forum is for the discussion of all Linux Software used in a desktop context.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
utility to "inventory" and "catalog" disks, media, etc and the folders & files
I inherited external storage and media cards that look like stacked firewood.
Can someone help me with what-to-build and how-to-build a solution to my requirements. I suspect that something might already exist given the explosion of game, music and video collections. (My tool of preference is python coupled with some DBMS.)
I seek a utility that will scan each of these then:
gather file details
collect file details within each folder
recurse into any sub-folders as needed
TAG each file and folder with a unique device identifier
store the data so that I can search for content of interest
stored data should enable processing to locate duplicate content
not only withing a given folder or device but across all devices
From an operational perspective things might go something like this:
insert or connect the storage device of interest
learn the unique ID of the device
test if we have seen this device before
if not, then add it to the inventory
scan the device for folders and files
save the scan results with the device ID as an attribute
option 1 — create a work file of results
post the work file contents tied to the device ID
option 2 — post the results directly
report a summary, subject to --quiet option, of the completed scans
QUESTION: Is there some way to read the SERIAL_NUMBER or UUID of any "storage" device I might connect?
As with any contemporary workstation, I have an internal & external mix of HDD, SSD, NVME, media cards, and thumb/key drives. These are connected by USB or Thunderbolt. {I'll worry about eSATA and (ancient) real SCSI once I get some progress here.}
I know that I'm asking my LQ colleagues to help me design an application. The last time I did serious lines-of-code development was early linux in the late 90's. I've retired from "management" want to learn how to get my hands dirty with current tools and features.
I thank everyone who is willing to comment.
~~~ 0;-Dan
Last edited by SaintDanBert; 12-29-2023 at 01:57 PM.
I discovered — and should have known (blush,grin) — another issue that I'll need to deal with given that large number of devices from a large set of sources over a long career...
Specifically: Everything "Linux" has ownership and permissions. While they may all have the same names, what really matters are the user and group ID numbers. Any access attempt from my current workstation(s) will likely fail versus my current ID numbers and whatever happened who-knows-when.
In addition, given the large sizes of some of the devices, there are likely multiple file systems on a single device. This will require processing to iterate across all of these file system.
ANALYSIS
When scanning the newly connected device, the utility might collect and record ownership details.
The utility might implement an app-specific user & group much as web servers or data base servers.
Once the device has been scanned, we might "chown -R {user}:{group} /device"
(grimace) I'm only a couple of weeks into this and I already have both an application and database to design and deploy.
UUIDs, while fabulous at uniquely identifying, are quite cumbersome for human memories to work with. Filesystems can have volume LABELs assigned that are easier for most humans to manage, yet adequately unique. e2label and tune2fs are among tools that can assign LABELs to existing filesystems. I suggest you employ LABELs in your cataloging and analysis.
Drives and memory cards have serial numbers and there are several utilities that will output the information. lsblk maybe the easiest to parse. It can also output UUIDs and labels.
Quote:
lsblk --nodeps -o name,serial
What types of files? What file details to you want to collect?
What DBMSs are you most familiar? For SQL databases mariadb, postgresql or maybe sqlite?
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.