We propose a framework called HyperVAE for encoding distributions of distributions. When a target distribution is modeled by a VAE, its neural network parameters \theta are drawn from a distribution p(\theta) which is modeled by a hyper-level VAE. Given a target distribution, we predict the posterior distribution of the latent code, then use a matrix-network decoder to generate a posterior distribution q(\theta). HyperVAE can encode the parameters \theta in full in contrast to common hyper-networks practices, which generate only the scale and bias vectors to modify the target-network parameters. Thus HyperVAE preserves information about the model for each task in the latent space. We evaluate HyperVAE in density estimation tasks, outlier detection and discovery of novel design classes.