PPoPP'07
Home Page
Call for Papers
Call for Workshops & Tutorials
Submissions
Registration
Hotel Information
Program
Workshops & Tutorials
Poster Session
Conference Committees
Previous PPoPP Symposia
|
|
PPoPP Keynote Talks
Parallel Programming Environment: A Key to Translating Terascale Platforms into a Big Success
Jesse Fang, Intel
Wednesday, March 14, 12:30–2:00 pm
Abstract
Moore’s Law will continue to increase the number of transistors on die for a couple of decades, as silicon technology moves from 65 nm today to 45 nm, 32 nm, and 22 nm in the future. Since power and thermal constraints increase with frequency, multi-core or many-core microprocessors will be the way of the future. In the near future, hardware platforms will have sixteen or more cores on die to achieve more than one tera instructions per second (TIPs) computation power. These cores will communicate with each other through an on-die interconnect fabric with more than one Tb/s on-die bandwidth and less than 30 cycles latency. Off-die D-cache will employ 3D stacked memory technology to tremendously increase off-die cache/memory bandwidth and reduce the latency. Fast copper flex cables will link CPU-DRAM on socket, and optical silicon photonics will provide up to one Tb/s I/O bandwidth between boxes. The hardware system with TIPs of compute power operating on terabytes of data make this a terascale platform.
What are the software implications with the hardware changes from uniprocessor to terascale platform with many cores as the “way of the future”? It will be a great challenge for programming environments to help programmers develop concurrent code for most client software. A good concurrent programming environment should extend existing programming languages that typical programmers are familiar with, and bring benefits for concurrent programming. There are many research topics. Examples topics include flexible parallel programming models based on needs from applications; better synchronization mechanisms such as transactional memory to replace simple “thread + lock” structure; nested data parallel language primitives with new protocols; fine-grained synchronization mechanisms with hardware support; maybe fine-grained message passing; advanced compiler optimizations for the threaded code; and software tools in the concurrent programming environment. A more interesting problem is how to use such a many-core system to improve single-threaded performance.
Bio
Jesse Fang is Director and Chief Scientist of the Programming System Lab at Intel/CTG (Corporate Technology Group). Jesse created the lab about 11 years ago, and has been leading the lab to develop programming environment technologies to enable Intel hardware microarchitecture research and microprocessor design, and to transfer software technologies to Intel’s Software Solution Group. Before joining Intel in 1995, Jesse was manager of the Hewlett-Packet Research Lab compiler team that initiated the Itanium Architecture in 1991. Jesse ran a small startup between working at HP and Intel. Before HP Labs, Jesse was working as manager or technical leader on parallel/vector compilers at Convex and Concurrent Computer Corporation, respectively, in 1989 and 1986. Jesse Fang received his Ph.D. in Computer Science at the University of Nebraska-Lincoln before he did a post-doctorate at the University of Illinois Urbana-Champaign. He was Assistant Professor at Wichita State University at Kansas before moving to industry. Jesse received his B.S. in math at Fudan University in Shanghai.
Pervasive Parallel Computing: An Historic Opportunity for Innovation in Programming and Architecture
Andrew A. Chien, Intel
Friday, March 16, 8:30–9:30 am
Abstract
Parallel programming has been the subject of deep research for decades — and renowned in the software community as a difficult challenge, to the degree that many companies have teams of parallelism and concurrency experts. Further, many ISVs explicitly design their software architectures so as to ensure that the majority of the development effort, including of course debug and test, can be done without consideration of parallelism. What makes parallelism so difficult are the knotty and coupled problems of correctness; performance, particularly data locality; and software modularity.
In terascale (many-core) chip-level multiprocessors, we are facing a pervasive and critical parallel programming challenge. Core counts on a single chip are expected to increase rapidly, progressing with Moore’s law, and quad-core systems are already available today in mainstream volume client and server platforms. To continue the rapid performance scaling to which we have become accustomed, applications will need to exhibit ample parallelism (and increasing amounts of it) for successive generations of hardware. Further, because the move to multiple-core parallelism as the primary basis for performance improvement is pervasive, this requirement falls on a wide range of applications including traditional large-scale commercial and HPC servers, desktops, laptops, and even those running on small mobile devices. That breadth has numerous implications for the types of solutions that are required. We will discuss some of the requirements for terascale parallel programming solutions and point out several potentially fruitful directions. A number of these solutions will build on mainstream programming approaches (objects, modularity, imperative), particularly introducing parallelism with modest disruption to both large-scale and local-scale program structure. However, there is an opportunity for radically different approaches to take hold in the mainstream (e.g., functional).
On the hardware front, there are several reasons why the parallel programming problem for terascale (many-core) systems is easier than for previous generations of multiprocessors (and can be much easier). The basic hardware characteristics of chip-multiprocessors provide much greater opportunity for efficient coupling and coordination, and a tightly coupled memory system, simplifying a wealth of sophisticated scheduling and sharing structures. Further, the diminishing performance returns for larger single cores releases the bounty of Moore’s law for new types of software-hardware innovation to support both parallel programming and higher-level programming in general. This is a huge opportunity to pioneer new approaches and solutions that are radically better than those widely used today.
We will close with some speculation on the rate of progress of parallel programming into the mainstream software community and some implications of such proliferation.
Bio
As a Vice President and Director of Intel Research, Dr. Andrew Chien oversees Intel’s exploratory research. This includes Intel’s innovative network of university research labs, and leadership of Intel’s research programs with universities and governments around the world. The portfolio of exploratory research projects spans a broad spectrum of technical areas including computer architecture, distributed systems, robotics, networking, communications, machine learning, human-computer interaction, ethnography and emerging markets.
Prior to joining Intel, from 1998 to 2005 Dr. Chien was the SAIC Endowed Chair Professor in Computer Science and Engineering at the University of California, San Diego and a Senior Fellow at the San Diego Supercomputing Center. He was the founding director of the Center for Networked Systems (CNS) — a university-industry alliance focused on developing technologies for robust, secure, and open networked systems. From 1990 to 1998, Dr. Chien was a Professor at the University of Illinois and Senior Scientist at the National Center for Supercomputing Applications. For over 20 years, Dr. Chien has led research and development of high-performance computing systems, with expertise in networking, grids, high performance clusters, distributed systems, computer architecture, high speed routing networks, compilers, and object-oriented programming languages. Dr. Chien is an NSF Young Investigator, ACM Fellow, and IEEE Fellow. |