VXHeaven organizer
indicators@9c855c95e7 | ||
python_refonly | ||
.clang-format | ||
.gitignore | ||
.gitmodules | ||
LICENSE | ||
Makefile | ||
README.md | ||
threadpool.cpp | ||
threadpool.hpp | ||
tree.hpp | ||
tree_test.cpp | ||
vxheaven_parse.cpp | ||
vxheaven_parse.hpp | ||
vxorg.cpp |
vxorg
vxheaven organizer (converts it from a flat hierarchy of ~270k+ files to a neat tree). Originally written in python, I rewrote it in C++ for performance reasons.
History
- 2018: I wrote a really shoddy attempt at doing organization in Bash. It sucked because I wasn't taking care of many idosyncracies about sample naming.
- It also was very primitive and slow, since it would continually spawn
mv
processes just to move files. (same formkdir
too, but that is less of a concern since it's done less)
- It also was very primitive and slow, since it would continually spawn
- 2023: I wrote a new script in Python. It was "better" but still didn't work
- I actually made the same mistake and tried to write in Bash again, but even Python was worlds faster, so I rewrote it in python
- October 21, 2024: I decided to start rewriting the Python script I wrote to parse into a N-ary tree for memory savings while still allowing memoization. (and be modular instead of one blob)
- Later in the day, as an experiment, I rewrote the parsing algorithm (fixing a bug in the process) in C++. It was 100x faster, so I committed to a rewrite in C++
Building
make
Usage
- Generate a list of samples.
tar tf xxx/viruses-2010-05-18.tar.bz2 | sed 's/\.\///g' | awk NF | sort > list
is one option. Not the best but it's (basically) what I did
- Run with
./vxorg list src/ dest/
dest/
will be created if it does not exist.- It will show a progress bar as it completes.