Growing demand for energy-efficient, high-performance systems has resulted in the growth of innovative heterogeneous computing system architectures that use FPGAs. FPGA-based architectures enable designers to implement custom instruction streams executing on potentially thousands of compute elements. Traditionally, FPGAs have been used as compute elements on PCI devices; however, this does not allow the FPGAs to be co-processors. This paper describes a high-performance system architecture that is based on the Intel® Xeon® platform in which one or more FPGAs, acting as application accelerators, replace one or more processors in a dual/multi-processor (DP/MP) platform. The FPGA is thus connected directly to the Front Side Bus (FSB) and enjoys the same privileges as a processor, i.e., full participation in the coherency protocol, unrestricted access to system memory and to other processors via the high bandwidth, and low latency connection to the FSB. In addition, we also describe a software layer called the "Accelerator Abstraction Layer (AAL)", which provides a uniform, hardware- and/or platform-independent application interface. Applications written on AAL can be ported to multiple platforms that have different types of accelerators and the application does not have to be modified. In addition, the AAL also enables the developer/user to reprogram the FPGA on the fly (analogous to an operating system context switch) thereby utilizing the programmable nature of the FPGA. The resulting hardware/software stack creates a flexible and powerful platform for accelerator innovation and deployment.