Redshifted broad absorption line quasars found via machine-learned spectral similarity

We report the discovery of 31 new redshifted broad absorption line quasars (RSBALs) from the Sloan Digital Sky Survey (SDSS). The number of previously known such objects is 19. The identification of the new objects was enabled by calculating similarities between quasar spectra in the SDSS. Using these similarities we look for the objects that are similar to the ones in the original sample, visually inspecting only hundreds, out of over 160,000 spectra considered. We compare the performance of several similarity measures, as well as different methods of employing them, in finding the RSBALs. We find that decision tree based similarities recover the most objects, and that an ensemble of methods performs better than any single one. As the similarities are not tailored for the specific problem of finding RSBALs, they could be used for searching for other types of quasars. The similarities and the code for their calculation are available online.

[1]  A. Myers,et al.  The Sloan Digital Sky Survey Quasar Catalog: Fourteenth data release , 2017, 1712.05029.

[2]  H. Rix,et al.  Discovery and characterization of 3000+ main-sequence binaries from APOGEE spectra , 2017, 1711.08793.

[3]  Sahar Shahaf,et al.  Detecting outliers and learning complex structures with large spectroscopic surveys - a case study with APOGEE stars , 2017, 1711.00022.

[4]  J. Xavier Prochaska,et al.  Deep learning of quasar spectra to discover and characterize damped Lyα systems , 2017, 1709.04962.

[5]  D. A. García-Hernández,et al.  The Fourteenth Data Release of the Sloan Digital Sky Survey: First Spectroscopic Data from the extended Baryon Oscillation Sky Survey and from the second phase of the Apache Point Observatory Galactic Evolution Experiment , 2017, 1707.09322.

[6]  Aniruddha R. Thakar,et al.  Sloan Digital Sky Survey IV: Mapping the Milky Way, Nearby Galaxies, and the Distant Universe , 2017, 1703.00052.

[7]  M. Bietenholz,et al.  SN 1986J VLBI. III. The Central Component Becomes Dominant , 2017, 1701.08447.

[8]  N. Liao,et al.  Possible Quasi-periodic Modulation in the z = 1.1 Gamma-Ray Blazar PKS 0426–380 , 2017, 1701.00899.

[9]  W. M. Wood-Vasey,et al.  The Pan-STARRS1 Surveys , 2016, 1612.05560.

[10]  D. Poznanski,et al.  The weirdest SDSS galaxies: results from an outlier detection algorithm , 2016, 1611.07526.

[11]  Eamonn J. Keogh,et al.  CID: an efficient complexity-invariant distance for time series , 2014, Data Mining and Knowledge Discovery.

[12]  Prasanth H. Nair,et al.  Astropy: A community Python package for astronomy , 2013, 1307.6212.

[13]  B. A. Weaver,et al.  Broad Absorption Line Quasars with Redshifted Troughs: High-Velocity Infall or Rotationally Dominated Outflows? , 2013, 1306.2680.

[14]  W. M. Wood-Vasey,et al.  The Sloan Digital Sky Survey quasar catalog: ninth data release , 2012, 1210.5166.

[15]  G. Canalizo,et al.  THE NATURE OF LoBAL QSOs. I. SEDs AND MID-INFRARED SPECTRAL PROPERTIES , 2012, 1206.1827.

[16]  H. Meusinger,et al.  Unusual quasars from the Sloan Digital Sky Survey selected by means of Kohonen self-organising maps , 2012, 1203.0215.

[17]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[18]  James T. Allen,et al.  A strong redshift dependence of the broad absorption line quasar fraction , 2010, 1007.3991.

[19]  Zhi-Hua Zhou,et al.  Isolation Forest , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[20]  R. Becker,et al.  THE FIRST–2MASS RED QUASAR SURVEY. II. AN ANOMALOUSLY HIGH FRACTION OF LoBALs IN SEARCHES FOR DUST-REDDENED QUASARS , 2008, 0808.3668.

[21]  Robert Barkhouser,et al.  The Apache Point Observatory Galactic Evolution Experiment (APOGEE) , 2007, Astronomical Telescopes + Instrumentation.

[22]  Brian E. Granger,et al.  IPython: A System for Interactive Scientific Computing , 2007, Computing in Science & Engineering.

[23]  John D. Hunter,et al.  Matplotlib: A 2D Graphics Environment , 2007, Computing in Science & Engineering.

[24]  D. Ernst,et al.  Extremely randomized trees , 2006, Machine Learning.

[25]  S. Horvath,et al.  Unsupervised Learning With Random Forest Predictors , 2006 .

[26]  D. York,et al.  A Catalog of Broad Absorption Line Quasars from the Sloan Digital Sky Survey Early Data Release , 2003, astro-ph/0603070.

[27]  G. Canalizo,et al.  Low-Ionization BAL QSOs in Ultraluminous Infrared Systems , 2001, astro-ph/0107323.

[28]  F. Bonnarel,et al.  The SIMBAD astronomical database. The CDS reference database for astronomical objects , 2000, astro-ph/0002110.

[29]  J. Chiang,et al.  Accretion Disk Winds from Active Galactic Nuclei , 1995 .

[30]  Simon L. Morris,et al.  Comparisons of the Emission-Line and Continuum Properties of Broad Absorption Line and Normal Quasi-stellar Objects , 1991 .

[31]  G. Neugebauer,et al.  Ultraluminous infrared galaxies and the origin of quasars , 1988 .

[32]  Martin Krzywinski,et al.  Points of Significance: Classification and regression trees , 2017, Nature Methods.

[33]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[34]  Teuvo Kohonen,et al.  Self-organized formation of topologically correct feature maps , 2004, Biological Cybernetics.

[35]  John E. Davis,et al.  Sloan Digital Sky Survey: Early Data Release , 2002 .

[36]  L. Breiman Random Forests , 2001, Machine Learning.

[37]  D. Egret,et al.  The simbad astronomical database , 1991 .

[38]  T. Kohonen Self-organized formation of topographically correct feature maps , 1982 .