Cornerstone Sponsor  AMD  
  About NewsletterWorking GroupsWikiFAQsPresentationsSponsor OpenFPGA
  

OpenFPGA
Mission & Objectives
News & Activities
Newsletter
Web Resource Index
Mailing List
Forums
Presentations
Working Groups
Technology Roundtable
   Discussions Schedule

FAQs
OpenFPGA Wiki
Meeting Minutes
Board of Directors
Bootstrap Contributors
OpenFPGA Benchmarks
Sponsor OpenFPGA

Membership
Membership Information
Membership FAQs
Apply for Membership
Organizations
   with Current
   OpenFPGA Members

  FPGA Events at SC 2005

SC05 BOF-22: Programming standards for FPGAs in HPC Applications

Organizers and Presenters:
Aussie Schnore (GE Global R/D)
and
Malachy Devlin (Nallatech)

Note: The following is an approximate transcript of the presentation and discussion at the BOF. The content was generated from accumulated notes and not from a recorded dictation. Errors in content may be present.

OpenFPGA is targeted at a diverse audience. It includes this comfortable with FPGAs and those comfortable with HPC. This is the time to bring everyone together. Goal is to become official in 1Q 2006.

Who is it targeted at?
It's meant to bring together everyone. The area is too new not to listen to everyone.

After the presentation, if you want to get involved -- signup at the website, get involved in working groups, get on the mailing list to monitor.

Discussion of what an FPGA is.

Why the interest in FPGAs now? FPGAs have been around in the role of glue logic. In that manner, very much akin to the microprocessor, with early versions used as simple controllers. Now for FPGAs, there are incredible increases of speeds and densities. Floating point becoming viable and cost is going down.

Along with that, years of work on design tools. Algorithms to implementation tools have been improving at an immense rate.

Good stuff. But if performance isn't there, it doesn't matter.

Do FPGAs provide a motivation for a paradigm shift?

GE spends a lot of resources on CFD problems -- e.g. blade on a turbine.
The goal is to migrate to subsystem, system and full engine modeling in appropriate turnaround time, ultimately increasing maturity of virtual prototype.

Here's a graph that shows a profile of the internal codes. This represents about 85% of HPC utilization at GE. Highlighted Smoother, Euler, Viscous taking about 50% of total together.
Focusing on the Euler section.

Hours of time in the conventional computer are required. Eliminate the run-time for the section and great run times can be achieved.
Caveats -- doesn't improve data migration costs; not including engineering costs. Not as impressive in that it takes 7 months to develop one routine.

Why does the FPGA have potential?
Because of ability to define I/O on FPGA, memory architecture connectivity can be tuned to the application. Also the ability to make an application specific cache improves performance. Ultimately, stringing together FP ops and get to the point with the pipeline is full and get all floating point ops in pipeline one per clock cycle. Not to mention, that one can also create memory traffic pattern.

If the speed-up was just one application -- it would be exciting to a subset of attendees. But research on the web showed areas where FPGAs are employed: Seismic, Genomic, Encryption, and N-body simulation. In fact, n-body simulations are on the floor.

Is the FPGA faster today? What about the future?
Looking at need to go to multicore for general purpose CPUs. This is why it's exciting. In fact, a multicore CPU taken to a ridiculous limit is an FPGA.

FPGA is targeted to be order of magnitude ahead of general purpose CPUs in a short amount of time. General limitations are bandwidth, does the FPGA have the upper-hand?

Great FPGA is the way to boost performance. Still, how do I program the FPGA?
Turns out software processes are part of the key. Run a compiler, identify hotspots and target these for acceleration.

What has to be done additionally to use an FPGA?
Take out the hotspot and develop. Go through FPGA flow to create binary for FPGA that needs to be reintegrated into the software. The FPGA flow has in the past has used HDL. In its simplest form, it's a schematic capture of digital logic captured in text form.

Take the Euler code example. It took 7 months for this process to get to FPGA level. Not acceptable, given that 1 month is the window within which demonstrable results are required.

Thankfully, smart people building tools to capture this and bring it down to hardware. At best, it takes software specification and to do a port.
At worst, it takes code rewrite to expose data to take advantage of FPGA hardware. Took about 5 weeks to build Euler and achieved a 14x improvement with tools.

HLL Roundup --

Partial list of vendors, each has different underlying assumptions about the hardware. Not exhaustive list. If there are people aware of other efforts, get on the OpenFPGA website. If you are interested in these, many of the vendors are in the audience or on the floor.

But there are still problems.

Projects diverge into multiple code bases.
Boundary between SW and HW is in flux.
FPGA is dependent on high level architectural designs.

Have tools, but still have problems. How are these going to mature?

What is keeping FPGAs from going mainstream?
Had discussion at MAPLD and had great feedback.

<Turn over to Malachy Devlin>

Applications you can do many great things. Still it tends to be in research community. What is required to take mainstream?

Sam Allisant from IBM -- one of the factors keeping it from going
mainstream: Competition against microprocessors. Investment made in tools is much bigger for microprocessors.

Whose primary funding force between OpenFPGA organization? OSC gracious to donate effort to start it. It's a grass roots organization. Comment:
Need a funding source and motivation to keep it going based on experience in standards committee.

We are looking to take it to the next stage.

How do these languages compare to Verilog? Verilog is an HDL. System Verilog and VHDL type tools.

MAPLD had a list of things to standardize. Obvious thing is to have the same API for things to make easier C-calls. On the FPGA-end a standard interface for FIFOs. Every interconnect will be different, but if there was a standard FIFO to hide differences between vendors.

Need to come up with a thorough list of what these are.

Wondering if there was a difference in clock-rate and density for 5 week design relative to 7 month effort?
The answer is yes. The 5 week one was faster and better utilization.
There was an issue in address generation. It wasn't optimal. The tool was able to build and use best practice. Insert the best of the best using the tools.

From the company's perspective, what are things to do to sell it inside?
What can you not do? Tweak existing code or does a new language need to be learned?
If the answer is to tweak an application and recompile it, it will go over much easier with management, rather than the alternative to describe a new hardware flow.

Goal is to get things into the hands of the domain experts. Get FORTRAN guy to be able to redo what he does.

Also a lack of hardware inhibits internal use.

Two things would help a lot. Standardize language across platforms.

This is becoming less and less of an issue. Handle-C, VIVA, Mitrion Impulse-C are supporting multiple hardware.

Another thing that might help, enable targets to non-optimal code for simulation on GP CPU.

So basically there are two sides: user side and vendor side. Vendors want to differentiate and special functions to port to memory. Users have two
camps: Software and hardware users. I want 2-bit instead of 64-bit.
There are so many aspects. Not sure there is a right course to go through a common language. It may abstract it out too much.

Comes from hardware background, the motivation to tune is strong with a hardware person. I have to say impressed with HLL results in past couple of years. Think of being very similar to HLL and assembly language.

Past problems include the need to pay $100,000 for specific equipment, want to use vendor specific capabilities.

Arguments are similar to the MPI arguments. Vendors were very concerned about their own sends and receives. But have to realize that MPI is successful, if because we did agree on a standard. So programs can be written. If you can get to at least basic sets, then C extension, C-API then a lot will be gained. May not be satisfactory with everyone. MPI was funded with NSF with university and vendor support. Don't define yourself away.

MPI was defined to be POSIX aware. Rusty Lusk from Argonne made sure MPI was to be thread aware.

Most applications outlive the hardware. If one is going to invest in an accelerator, what are the options when the accelerator is obsolete. It's a very real issue. MPI has allowed apps to move on to new hardware. Need to emulate that type of transparency.

There is an analogy with FPGAs. Can't port until it can be compiled.

Kevin from FPGA Journal. Need to have three components to get application developed. Here people getting confused. Start with procedural language.
The first problem - parsing and elaborating is pretty well solved.
Second problem, parallelizing application logic- very hard. A few people working on it. Third problem is operating systems, which are abstractions from hardware. This applies here as well. The least work is done in here, and just scratching the surface now.

Only way is to have cake and eat it too. First is a HLL with no hardware at all. A second language which is application specific. If it's within code, then it's hard to scale up to new hardware.

GE has huge pieces of code that would be good to move. Problem is loss of original developers. In the next piece of this, talk about standards.
Standards are an end goal.
I think one thing keeping FPGAs from going mainstream, the required shift in methodology. C programmers living with hardware limitations. Need to provide a list of service providers who have expertise combining FPGAs and microprocessors. Here is a list of third party experts to consult with and get started for the appropriate application.

A caution. Is the problem compute bound or I/O bound? Audio processing, for example.
One solution is using a byte code that is more compact than the data.
Most efforts are library acceleration. Can we look at work for domain specific processors?

It may be possible to allow creating virtual machine specific with the instructions. Create a "super operator" The whole idea in VM space is great. A company called AZOUL [?] looking to add primitives with virtual machines. FPGAs are a means to an end.

What about and SRAM in core? Seems FPGAs are clunky and big.

Actually example have taken FPGA fabric and put in the instruction path.
Even Xilinx has powerpc hard cores and so forth. Many solutions are possible here.

Talked about CFD code, but also FPGAs are used for advanced CT reconstruction. In this situation, we had to create very different architecture.

Maybe question, are there examples of domain specific processors?

Want to be more productive. Go to the other extreme. Thousands of lines of code for the HLL. The question for the FPGAs, new models comes back to standards. What bits to standardize, what's the order?

Maya listed several of them and are a good start. Has anyone else had frustrations which would be better suited to standards? Hard enough to learn one thing.

David Peller -- standardized interfaces for toolflows that can work together. Just like OS layering. Get the interconnect well defined. Some standardized specification that various tools can write to.

Hearing standards for FPGA? What about higher level standards?

GMU and GW University -- trying to create a standard set of tools to deploy across platforms. Challenging when approaches are very different and different pins are blocked off. Need to define some pins accepted by vendors, partial reconfiguration, etc.

Dicey call, a specific processor but generic enough to be portable.

Absolutely right.

All agree that FPGAs and processors will sit side by side.

At EPCC -- don't want to learn VHDL.
If FPGAs are really going to work, need to give people constructs. Not saying there should be MPI constructs, but need constructs.

Dave Pointer -- NCSA -- hearing a lot of vendors giving us what we want.
Throw back to the research community. Throw back and make the case and vendors will provide it.

Discussion continued with review of the working groups forming