VXHeaven organizer
Go to file
2024-10-22 03:00:07 -04:00
indicators@9c855c95e7 it kind of works now 2024-10-21 21:21:37 -04:00
python_refonly it kind of works now 2024-10-21 21:21:37 -04:00
.clang-format Rewrite it in C++ 2024-10-21 12:22:35 -04:00
.gitignore it kind of works now 2024-10-21 21:21:37 -04:00
.gitmodules it kind of works now 2024-10-21 21:21:37 -04:00
LICENSE add license and README 2024-10-22 03:00:07 -04:00
Makefile use threadpool for stuff 2024-10-22 00:49:49 -04:00
README.md add license and README 2024-10-22 03:00:07 -04:00
threadpool.cpp use threadpool for stuff 2024-10-22 00:49:49 -04:00
threadpool.hpp use arguments to configure instead of hardcoded paths 2024-10-22 02:59:58 -04:00
tree.hpp use arguments to configure instead of hardcoded paths 2024-10-22 02:59:58 -04:00
tree_test.cpp clean up tree root detection 2024-10-21 19:56:43 -04:00
vxheaven_parse.cpp fix subvariant parsing/insertion into tree 2024-10-21 20:50:52 -04:00
vxheaven_parse.hpp fix subvariant parsing/insertion into tree 2024-10-21 20:50:52 -04:00
vxorg.cpp use arguments to configure instead of hardcoded paths 2024-10-22 02:59:58 -04:00

vxorg

vxheaven organizer (converts it from a flat hierarchy of ~270k+ files to a neat tree). Originally written in python, I rewrote it in C++ for performance reasons.

History

  • 2018: I wrote a really shoddy attempt at doing organization in Bash. It sucked because I wasn't taking care of many idosyncracies about sample naming.
    • It also was very primitive and slow, since it would continually spawn mv processes just to move files. (same for mkdir too, but that is less of a concern since it's done less)
  • 2023: I wrote a new script in Python. It was "better" but still didn't work
    • I actually made the same mistake and tried to write in Bash again, but even Python was worlds faster, so I rewrote it in python
  • October 21, 2024: I decided to start rewriting the Python script I wrote to parse into a N-ary tree for memory savings while still allowing memoization. (and be modular instead of one blob)
    • Later in the day, as an experiment, I rewrote the parsing algorithm (fixing a bug in the process) in C++. It was 100x faster, so I committed to a rewrite in C++

Building

make

Usage

  • Generate a list of samples.
    • tar tf xxx/viruses-2010-05-18.tar.bz2 | sed 's/\.\///g' | awk NF | sort > list is one option. Not the best but it's (basically) what I did
  • Run with ./vxorg list src/ dest/
    • dest/ will be created if it does not exist.
    • It will show a progress bar as it completes.