Applications of Machine Learning Algorithms in Processing Terahertz Spectroscopic Data

We present the data reduction software and the distribution of Level 1 and Level 2 products of the Stratospheric Terahertz Observatory 2 (STO2). STO2, a balloon-borne Terahertz telescope, surveyed star-forming regions and the Galactic plane and produced approximately 300,000 spectra. The data are largely similar to spectra typically produced by single-dish radio telescopes. However, a fraction of the data contained rapidly varying fringe/baseline features and drift noise, which could not be adequately corrected using conventional data reduction software. To process the entire science data of the STO2 mission, we have adopted a new method to find proper off-source spectra to reduce large-amplitude fringes and new algorithms including Asymmetric Least Square (ALS), Independent Component Analysis (ICA), and Density-based spatial clustering of applications with noise (DBSCAN). The STO2 data reduction software efficiently reduced the amplitude of fringes from a few hundred to 10 K and resulted in baselines of amplitude down to a few K. The Level 1 products typically have the noise of a few K in [CII] spectra and ~1 K in [NII] spectra. Using a regridding algorithm, we made spectral maps of star-forming regions and the Galactic plane survey using an algorithm employing a Bessel-Gaussian kernel. Level 1 and 2 products are available to the astronomical community through the STO2 data server and the DataVerse. The software is also accessible to the public through Github. The detailed addresses are given in Section 4 of the paper on data distribution.