In this work we introduce a novel meta-learning algorithm that learns to utilize the gradient information of auxiliary tasks to improve the performance of a model on a given primary task. Our proposed method learns to project gradients from the auxiliary tasks to the primary task from a {\em small} training set with ``parallel labels,'' i.e., examples annotated with respect to both the primary task and the auxiliary tasks. This strategy enables the learning of models with strong performance on the primary task by leveraging a large collection of auxiliary examples and few primary examples. Our scheme differs from methods for transfer learning, multi-task learning or domain adaptation in several ways: unlike na\"ive transfer learning, our strategy uses auxiliary examples to directly optimize the model with respect to the primary task instead of the auxiliary task; unlike hard-sharing multi-task learning methods, our algorithm devotes the entire capacity of the backbone model to attend the primary task instead of splitting it over multiple tasks; unlike most domain adaptation techniques, our scheme does not require any overlap in labels between the auxiliary and the primary task, thus enabling knowledge transfer between completely disjoint tasks. Experiments on two image analysis benchmarks involving multiple tasks demonstrate the performance improvements of our meta-learning scheme over na\"ive transfer learning, multi-task learning as well as prior related work.