Currently I am looking for an embeddable database (C++, Win32) and I found SQLite quite charming. However, I'm wondering whether it even makes sense to store file paths along with the file properties in an SQL database. The number of files can run from a few hundred or thousand into the millions or billions on a server system. This is for a software that explores the disk contents (not the contents of the files themselves, though).
What I was thinking about would be a table to store the full directory part and another to store the file properties (including the name). The latter would then contain a back-reference to the "parent" folder.
One thing I am considering as well is whether the directory table should store the full path for each directory, which would lead to storing redundant information such as:
ID | Name
0 | C:
1 | C:\Windows
2 | C:\Windows\System32
3 | C:\Windows\System32\config
instead of:
ID | Name | Parent
0 | C: | NULL
1 | Windows | 0
2 | System32 | 1
3 | config | 2
Of course I cannot get "greedy" about saving storage/memory and also store a single instance of each string (each path component), unless there is some kind of pruning or reference counting ...
Which one would you consider superior and why? Wouldn't the second method impose a performance penalty?
Also, are there any projects out there that are FLOSS and have implemented something similar (storing hierarchical path names along with properties), preferably already with SQLite?
In the schema I am thinking about, the file C:\Windows\System32\config\SOFTWARE
would be represented by something like:
ID | Name | Folder | Size | Attributes | ...
42 | SYSTEM | 3 | 1024000 | 0x00000301 | ...
SQLite should easily be able to handle this. See the Appropriate Uses For SQLite.
I'd prefer the second, self-joined form of your table. SQLite should have problem following the ID contained in the Parent
field back to the ID
(which should have an index). But the Name
field should have an index, too. This will enable quick lookup of existing folders when you insert a new entry into the table.
User contributions licensed under CC BY-SA 3.0