A Machine Learning Approach to Predict Missing Flux Densities in Multi-band Galaxy Surveys


We present a new method based on information theory to find the optimal number of bands required to measure the physical properties of galaxies with a desired accuracy. As a proof of concept, using the recently updated COSMOS catalog (COSMOS2020), we identify the most relevant wavebands for measuring the physical properties of galaxies in a Hawaii Two-0 (H20)- and UVISTA-like survey for a sample of $i<25$ AB mag galaxies. We find that with available $i$-band fluxes, $r$, $u$, IRAC/$ch2$ and $z$ bands provide most of the information regarding the redshift with importance decreasing from $r$-band to $z$-band. We also find that for the same sample, IRAC/$ch2$, $Y$, $r$ and $u$ bands are the most relevant bands in stellar mass measurements with decreasing order of importance. Investigating the inter- correlation between the bands, we train a model to predict UVISTA observations in near-IR from H20-like observations. We find that magnitudes in $YJH$ bands can be simulated/predicted with an accuracy of $1σ$ mag scatter $łesssim 0.2$ for galaxies brighter than 24 AB mag in near-IR bands. One should note that these conclusions depend on the selection criteria of the sample. For any new sample of galaxies with a different selection, these results should be remeasured. Our results suggest that in the presence of a limited number of bands, a machine learning model trained over the population of observed galaxies with extensive spectral coverage outperforms template- fitting. Such a machine learning model maximally comprises the information acquired over available extensive surveys and breaks degeneracies in the parameter space of template-fitting inevitable in the presence of a few bands.

arXiv e-prints