To Petabytes and beyond: recent advances in probabilistic and signal processing algorithms and their application to metagenomics