This course focuses on the efficient use of modern parallel systems ranging from multi-core processors and many-core accelerators to large-scale distributed memory clusters. The course puts equal emphasis on the theoretical foundations of parallel computing, and on practical aspects of different parallel programming models. It begins with a survey of common parallel architectures and types of parallelism, and then follows with an overview of formal approaches to assess scalability and efficiency of parallel algorithms and their implementations. In the second part, the course covers the most common and current parallel programming techniques and APIs, including for shared address space, many-core accelerators, and distributed memory clusters. Each component of the course involves solving practical computational and data driven problems, such basic as algorithms like sorting or searching and numerical data analysis problems.