This talk focuses on several related research projects that aim to improve the performance of HPC, Cloud and emerging Big Data applications. It is an understatement to say that the performance of multicore systems is limited by their memory systems’ performance. Our research has developed both hardware and software solutions to improve the performance of cache memories, and how new memory technologies such as 3D stacked DRAMs, Phase Change Memories can be included in the memory hierarchies. Software solutions include profiling of data access patterns, relocating data, and restructuring code to improve performance. Hardware solutions include heterogeneous memories that are built using different memory technologies, and near data processing (also known as processing in memory or PIMs). We are evaluating different architectures, including dataflow, coarse gained reconfigurable systems, GPUs and simple in-order RISC processors as PIM elements.