Localization by image retrieval is inexpensive and scalable due to its simple mapping and matching techniques. The localization accuracy, however, depends on the quality of the underlying image features, often obtained using contrastive learning. Most contrastive learning strategies learn features that distinguish between different classes. In the context of localization, however, there is no natural definition of classes. Therefore, images are artificially separated into positive/negative classes with respect to the chosen anchor images, based on some geometric proximity measure. In this paper, we show why such divisions are problematic for learning localization features. We argue that any artificial division based on a proximity measure is undesirable due to the inherently ambiguous supervision for images near the proximity threshold. To avoid this problem, we propose a novel technique that uses soft positive/negative assignments of images for contrastive learning. Our soft assignment makes a gradual distinction between close and far images in both geometric and feature space. Experiments on four large-scale benchmark datasets demonstrate the superiority of our soft contrastive learning over the state-of-the-art method for retrieval-based visual localization.