MEMORANDUM TO: RESTRICTED DPG FROM: J.J. BUNN SUBJECT: VISIT TO DIGITAL AND MIT IN BOSTON 27/8 TO 30/8 DATE: OCTOBER 11, 1996 CC: The purpose of the trip was to gather computing technology trends information for eventual use in the Computing Technical Proposals of the LHC experiments. Originally it had been planned that Philippe Gaillot/Digital, CW Hobbs/Digital and Sverre Jarp would also travel, but due to some administrative difficulties I was accompanied just by CW Hobbs to the Digital sites, and visited the MIT sites by myself. The information from Digital is strictly NON-DISCLOSURE. For this reason I ask that you do not copy and forward this trip report to other people. Digital Semiconductor, Hudson. The Digital Alpha FAB (chip fabrication plant) will be producing 0.18µ chips (referred to as EV8) by the year 2000. Currently the FAB is about to produce the first EV6 chips. The chip masks, which are used to produce the die for a given generation of chips, can be photographically shrunk, with some engineering trade-offs, so allowing an intermediate speed increase before the next generation design. So the sequence we expect to see is EV5, EV56, EV6, EV67, EV7, EV78, EV8, where the intermediates are simply shrinks from the previous generation. It is already known that EV7 will run at aound 1 GHz clock speed. Digital believe that their chip verification and validation procedures are superior to those of the other manufacturers since they use a dynamically generated pseudo-random test suite, rather than relying on a fixed set of tests. Although they are playing "catch-up" with Intel, they believe they will always have the faster chip, and that is the corporate goal. The designers we met have very sketchy information on the Intel P7 ("Merced") chip, but apparently this chip will not natively support the X86 instruction set, which struck me as surprising… We were given a very detailed overview of the features of the new EV6 Alpha. The first silicon is scheduled for November 1996. They have added a new instruction, square root, at the request of Cray (who use the Alpha chip in one of their supercomputers). Digital said they were very open to requests for new instructions, and I racked my brains trying to think of a suitable suggestion for an oft-used HEP instruction, but drew a blank! Memory bandwidth on the EV6 chip will be much improved over EV5 performance. This parameter is evaluated by the McCalpin benchmark, which rates the EV6 at about 1800 Mbytes/sec compared with a few hundred Mbytes/sec for other chips (including the EV5 Alpha). The EV7 chip will probably increase this further to around 5 Gbytes/sec. Digital have a very nice software product called FX!32 which runs on Alpha NT machines and dynamically translates Intel binary applications. This means that one can run, for example, the Intel version of Visual Basic, on an Alpha. FX!32 is clever in that it accumulates a database of information on Intel binaries that are run, which enables it to identify hot spots and gradually increase the translated application's speed. Unfortunately, it does not yet support X86 device drivers, so that one still requires native Alpha drivers for graphics cards, sound cards, and so on. Digital are keen to propagate the notion that an Alpha PC with FX!32 is simply a very fast Intel PC. My personal experience (gained, however, on an old 150 MHz "Jensen" Alpha PC), together with limitations like the lack of device driver support, make me believe that they are fighting a losing battle. I asked whether they could not make an Alpha chip support the X86/Pentium instruction set. The answer was that they could, but that they wouldn't want to do it. We heard about the new Tsunami ("Tidal Wave" in Japanese) chip set, that allows these very fast memory transfers. Quad Tsunami-based SMP systems will probably ship in 1999, with EV67 Alpha CPUs. Server Systems, Parker Street, Maynard We heard oblique references to the new WildFire servers (although the codename was never in fact mentioned), which achieve a very scalable architecture for bolting together SMP systems by using memory switches. These systems will sport very low memory latency (about a factor 2.5 between local and remote node memory access), compared with NUMA-based systems, which can only achieve a factor 5-10. Predictions were, that with this architecture, we would be seeing 128 processor SMP machines in the year 2000, and 4k processor machines by the year 2005. The industry is seeing a three year system architecture lifecycle at present, and this is likely to continue. Digital believe their major competitors in the server market will be SGI and HP/Convex. Memory channel machines will make good OO database servers. Digital are meeting with Objectivity and Versant to discuss this possibility. Mention was made of SHRIMP (www.cs.princeton.edu/~ida/shrimp.html), out of Princeton. SHRIMP (a Scalable, High- Performance, Really Inexpensive, MultiProcessor), is a parallel machine that uses virtual- memory mapping for internodal communication. Built from off-the-shelf components (Pentium PCs and an Intel Paragon routing backplane), SHRIMP permits communication directly at user-level while still providing protection for multiple processes and multiple users. It does this by mapping segments of virtual memory between processes running on different nodes. Software Engineering, Spit Brook Road, Nashua. Microsoft will no longer market their Fortran Powerstation product: this will be replaced by a product from Digital called Visual Fortran, due for release next April. Visual Fortran will use DEC's compiler technology (the MS compiler thus dies) and will run on Intel (Windows/NT and Windows95) and Alpha (Windows/NT) natively, and be fully integrated into Developer Studio. There will be optimisation levels for Pentium, P6, P7 and Alpha, but it will run on 486-based machines too. There will eventually full access to APIs a la VC++ (e.g. easy access to the winsock DLL. OLE controls will be supported (Fortran as a client). Mixed language linking will be no problem. Microsoft plans to integrate forms (a la Visual Basic) into Developer Studio, at which time the true Visual Fortran dream will be realized. There will be Fortran module wizards. Of course, the language is full Fortran90. I asked why Microsoft had wanted to offload what appeared to be an excellent product in Powerstation. Digital's feeling was that they suspected Microsoft were not comfortable with the Fortran marketplace, which tends to be big server orientated. There will be an upgrade policy for existing Powerstation users to Visual Fortran. Visual Basic for Windows/NT on the Alpha is being worked on. The new front end to Digital's C++ compilers will be fully ANSI compliant. (They are working on an ADA95 compiler.) We discussed briefly the merits of scientific computing in C++ versus in Fortran. From a compiler optimiser's point of view, C++ is much more of a challenge than Fortran (e.g. late bindings in C++). We then moved on to talk about Digital's brand new Java project. They have become Java licensees from Sun. There will be a Java developer's kit (JDK) for Digital Unix on Alpha (in fact there is already one available from the OSF Research Institute web page). Subsequently, Digital will release Java as part of the base operating system. At that point a Just In Time (JIT) compiler will be made available. This will be 15-20 times faster than the interpreter. Amazingly, there will also be a run time Java for OpenVMS! For Window/NT the timescale is less clear. We discussed how, in fact, many languages can be compiled into byte codes (I had mentioned the recent release of NetREXX from IBM), but C and C++ are a bit tricky (for some reason). The Java project leader enthused about the language, and its advantages over C++. He said how many C++ programmers are dissatisfied with the language, and Java may be the answer. Because of the relative simplicity of Java, he believed there would not be the "methodology flurry" that has accompanied C++. Another topic that came up was "true clusters", for Digital Unix. These will be the first VMS-like clusters for Unix servers. The current implementation allows clusters of a small number of machines, with interconnects of a few metres. The cluster software incorporates a version of the VMS lock manager, and is designed to be portable to other hardware platforms. It was not clear whether, then, one could envisage a mixed architecture True Cluster of Digital, HP and SGI machines. First release of a product is optimistically slated for late next year. MIT Media Lab , Cambridge. I visited the Autonomous Agents group, who are researching into software "agents". These are programs that assist people with mundane tasks they would rather not have to do themselves. The Media Lab has a research arm and an academic arm. The research arm is funded at the 80% level by industry. Companies put money in so that they are kept aware of leading edge media technologies: this is often much cheaper than them funding their own research departments. A typical contribution is 250 kUSD. For example, ABC News sponsor the lab.: their interest is in fully distributed audio across the Internet. Car manufacturers have been sponsors for some time: their interest has been in intelligent instruments etc. in the car. The Agents group is addressing Web-based software that lies somewhere between artificial intelligence and human-computer interaction. Common examples of agents include email filters, which filter and sort electronic mail, and the Firefly agent (www.ffly.com/), which sorts and suggests music and films according to user preferences, and which can also put the user in touch with other people who have similar tastes. (The Firefly agent is produced by Agents, Inc., a spin-off company from the Media Lab., who will shortly release an agents SDK.) Thus the gamut of agents address information overload related problems (navigation, filtering, retrieval, recommendation and browsing). My particular interest was in how some of this technology might be brought to bear on better information transfer amongst people in an LHC experiment, on experiment specific Web servers, and perhaps on the ensemble of Web servers at CERN. For example, the CMS Web server already hosts many hundreds of documents, and is being added to all the time. It is very hard to find items of interest, and there is no mechanism by which you can be informed when a new item of potential interest is added. An aspect of dealing with information in a large collaboration like CMS is of avoiding duplicity of work, keeping everyone up to date on what is going on, and putting people in contact with one another who are working on similar problems. One research project at the Media Lab addresses exactly this problem. The YENTA system is a matchmaking program, whose goal is to act as a fully distributed system on behalf of a group of people (e.g. a collaboration or an organisation). The system is fed by information on people in the group (usually provided by the individuals themselves), and then instantiates an agent for each individual. (This can turn into an MPP problem for a large group of users!) The agents then converse between themselves, and inform their clients (e.g. via email) when an interesting match turns up. The agents should observe social norms, and have to be configurable so that they only divulge as much information to other agents as the client user wishes. A specific problem that YENTA solves is that of posting a problem to a mailing list: you know that someone on the list can probably solve the problem, but not who. YENTA obviates the need for the mailing list in the first place. Another project at the media Lab. is using genetic algorithms to assist searching problems. Users are notoriously bad at formulating search criteria (specifying keywords and such), but are of course experts at evaluating the results from a search. The idea is to use a genetic algorithm to generate the search criteria, which is iteratively modified with the user's feedback from the results. In the area of information browsing, a project is underway to make a statistical study of how people browse documents, and how successfully they obtain what they are looking for. By looking at a large number of browser transactions, patterns emerge that in principle can assist future users. This work has turned up the need for a higher-level representation than the URL, that can contain information on the type of document that will be found if the link is followed. This need is probably addressed completely by the new PICS standard (see below). The LETIZIA project is investigating speculative look ahead of links on a Web page currently being read. Browsing usually involves following links that appear first in a document, which tends to rapidly drop the user into a feeling of "lost in the Web". While the user is examining the current document, LETIZIA examines documents that are linked to by it, and in this way helps to "flatten" the information structure. It can then make suggestions on the most relevant link to be followed next, for example. Unfortunately, apart from the agents SDK shortly to be released by Agents, Inc., there is no readily available software for authoring new agents: it must all be done from scratch by interfacing to the HTTP server software. If we wanted to develop an HEP Experiment Agent, then the best approach would seem to be to either have one of the Media Lab. staff or students come to work closely with us for a few months, or to send someone to work with them for a similar period. I was informed that a joint project such as this would be encouraged by the Lab.. MIT LCS, Cambridge I met Tim Berners-Lee, of the MIT W3 Consortium. We exchanged news of CERN and the Consortium. In the context of computing technology trends for the LHC, he suggested the importance of making the right choice of APIs (e.g. the AWT for Java) and class libraries. Microsoft's ActiveX was going to make a major impact. He enthused (as usual!) about RPCs. We heartily agreed that it is too early to make any commitment on programming languages, and how good program design is of much greater importance at this stage. We discussed AMAYA, from the Inria branch of the Consortium, which is a free browser with source code. One hope is that this will break the unhealthy monopoly on browsers enjoyed by Netscape and Microsoft, by allowing fruitier capabilities and embellishments than offered in Netscape and Explorer. Tim was particularly pleased at the recent agreement in the Consortium on PICS, which was endorsed on the tickets of, for example, implementing parental control for browsing Web sites. One nice side effect of this standard is that the PICS server can hold all sorts of extra information (apart from content rating) about Web sites, and can then be used to address the problem already mentioned above, of how to obtain an idea of what a page contains before actually loading it. On a lighter note, I particularly enjoyed a couple of Tim's stories about Web/technology awards he had been offered but had been unable to take up. 5 6