1                        OpenSM Release Notes 3.0.13
2                       =============================
3
4Version: OpenFabrics Enterprise Distribution (OFED) 1.2
5Repo:    git://git.openfabrics.org/~ofed_1_2/management.git (release)
6         git://git.openfabrics.org/~halr/management.git (development)
7Date:    June 2007
8
91 Overview
10----------
11This document describes the contents of the OpenSM OFED 1.2 release.
12OpenSM is an InfiniBand compliant Subnet Manager and Administration,
13and runs on top of OpenIB. The OpenSM version for this release
14is openib-3.0.13
15
16This document includes the following sections:
171 This Overview section (describing new features and software
18  dependencies)
192 Known Issues And Limitations
203 Unsupported IB compliance statements
214 Major Bug Fixes
225 Main Verification Flows
236 Qualified software stacks and devices
24
251.1 Major New Features
26
27* Routing improvements
28  Two additional routing algorithms have been added in addition to
29  performance improvements to the existing routing algorithms. The
30  two new routing algorithms are FAT tree and LASH. See the
31  opensm man page for additional details.
32
33* SA Optional Record support now "virtually" complete
34  Includes SA InformInfo improvements and InformInfoRecord support in
35  addition to support for the remaining SA optional records
36  (MulticastForwardingTableRecord, SwitchInfoRecord). Also, SMInfoRecord
37  support was improved to include all SMs found.
38
39* SA database dump/restore
40  OpenSM now includes the ability to dump and restore the SA database.
41  This allows for all SA registrations (multicast, services, and events)
42  to be saved and restored later.
43
44  In verbose mode, OpenSM will dump SA DB (existing multicast groups,
45  services and InformInfo) into dump file which named "opensm-sa.dump"
46  and located under standard OpenSM dump directory (/var/log by default).
47
48  If option -S is specified and SA DB dump file name is provided, OpenSM
49  will try to restore SA database from this file. And if restore is
50  successful, OpenSM won't ask for client reregistration at subnet bring-up.
51
52* Modular routing for multicast
53  In conjunction was SA database dump/restore, there is the ability to
54  dump and load switch lid matrices (min hops tables) which are used
55  for multicast route calculation.
56
57* IB router enablement
58  OpenSM now supports router ports properly (in terms of PortInfo handling).
59  There is also some experimental support for IB routers which is enabled
60  via the ROUTER_EXP compile flag. This support includes SA PathRecord and
61  MCMemberRecord support for off subnet GIDs.
62
63* Socket support added to console
64  OpenSM console now supports remote in addition to local access.
65  Remote access is currently via telnet.
66
671.2 Minor New Features:
68
69* Change output format of DR path from hex to decimal port numbers
70
71* Log rotation
72  The OpenSM log can now be rotated while OpenSM is running (without
73  stopping and restarting OpenSM). This is accomplished via SIGUSR1.
74
75* Support scope for IPoIB multicast groups in partition config
76
77* Dump filename changed from subnet.lst to osm-subnet.lst
78  Default temp directory for non Windows platforms was previously changed
79  from /tmp to /var/log.
80
81* Add option for force SDR link speed
82  Add option to opensm.opts to force link speed. Currently, only forcing
83  to SDR link speed is supported. This option is not supported as a
84  command line option.
85
861.3 Library API Changes
87
88  None
89
901.4 Software Dependencies
91
92OpenSM depends on the installation of either OFED 1.2, OFED 1.1,
93OFED 1.0, OpenIB gen2 (e.g. IBG2 distribution), OpenIB gen1 (e.g. IBGD
94distribution), or Mellanox VAPI stacks. The qualified driver versions
95are provided in Table 2, "Qualified IB Stacks".
96
971.5 Supported Devices Firmware
98
99The main task of OpenSM is to initialize InfiniBand devices. The
100qualified devices and their corresponding firmware versions
101are listed in Table 3.
102
1032 Known Issues And Limitations
104------------------------------
105
106* No Service / Key associations:
107  There is no way to manage Service access by Keys.
108
109* No SM to SM SMDB synchronization:
110  Puts the burden of re-registering services, multicast groups, and
111  inform-info on the client application (or IB access layer core).
112
113* No "port down" event handling:
114  Changing the switch port through which OpenSM connects to the IB
115  fabric may cause incorrect operation. Please restart OpenSM whenever
116  such a connectivity change is made.
117
118* Changing connections during SM operation:
119  Under some conditions the SM can get confused by a change in
120  cabling (moving a cable from one switch port to the other) and
121  momentarily see this as having the same GUID appear connected
122  to two different IB ports. Under some conditions, when the SM fails to
123  get the corresponding change event it might mistakenly report this case
124  as a "duplicated GUID" case and abort. It is advisable to double-check
125  the syslog after each such change in connectivity and restart
126  OpenSM if it has exited. The same error ("duplicated GUID") will
127  also appear with a loopback plug.
128
1293 Unsupported IB Compliance Statements
130--------------------------------------
131The following section lists all the IB compliance statements which
132OpenSM does not support. Please refer to the IB specification for detailed
133information regarding each compliance statement.
134
135* C14-22 (Authentication):
136  M_Key M_KeyProtectBits and M_KeyLeasePeriod shall be set in one
137  SubnSet method. As a work-around, an OpenSM option is provided for
138  defining the protect bits.
139
140* C14-67 (Authentication):
141  On SubnGet(SMInfo) and SubnSet(SMInfo) - if M_Key is not zero then
142  the SM shall generate a SubnGetResp if the M_Key matches, or
143  silently drop the packet if M_Key does not match.
144
145* C15-0.1.23.4 (Authentication):
146  InformInfoRecords shall always be provided with the QPN set to 0,
147  except for the case of a trusted request, in which case the actual
148  subscriber QPN shall be returned.
149
150* o13-17.1.2 (Event-FWD):
151  If no permission to forward, the subscription should be removed and
152  no further forwarding should occur.
153
154* C14-24.1.1.5 and C14-62.1.1.22 (Initialization):
155  GUIDInfo - SM should enable assigning Port GUIDInfo.
156
157* C14-44 (Initialization):
158  If the SM discovers that it is missing an M_Key to update CA/RT/SW,
159  it should notify the higher level.
160
161* C14-62.1.1.12 (Initialization):
162  PortInfo:M_Key - Set the M_Key to a node based random value.
163
164* C14-62.1.1.13 (Initialization):
165  PortInfo:P_KeyProtectBits - set according to an optional policy.
166
167* C14-62.1.1.24 (Initialization):
168  SwitchInfo:DefaultPort - should be configured for random FDB.
169
170* C14-62.1.1.32 (Initialization):
171  RandomForwardingTable should be configured.
172
173* o15-0.1.12 (Multicast):
174  If the JoinState is SendOnlyNonMember = 1 (only), then the endport
175  should join as sender only.
176
177* o15-0.1.8 (Multicast):
178  If a request for creating an MCG with fields that cannot be met,
179  return ERR_REQ_INVALID (currently ignores SL and FlowLabelTClass).
180
181* C15-0.1.8.6 (SA-Query):
182  Respond to SubnAdmGetTraceTable - this is an optional attribute.
183
184* C15-0.1.13 Services:
185  Reject ServiceRecord create, modify or delete if the given
186  ServiceP_Key does not match the one included in the ServiceGID port
187  and the port that sent the request.
188
189* C15-0.1.14 (Services):
190  Provide means to associate service name and ServiceKeys.
191
1924 Major Bug Fixes
193-----------------
194
195The following is a list of bugs that were fixed. Note that other less critical
196or visible bugs were also fixed.
197
198* osm_sminfo_rcv.c: Add SMInfo self query check. OpenSM can query
199  itself for SMInfo occassionally due to port moving during subnet
200  discovery process. Don't create remote SM entry in this case to
201  prevent deadlocks.
202
203* osm_ucast_updn.c: Two similar bugs in up/down routing fixed.
204  8-bit integers were used as indexes when scanning subnet, which
205  in one case caused OpenSM to crash when ranking "path" is longer
206  than 256 switches, and in the other case, caused OpenSM to go into
207  an infinite loop when fabric has more than 256 roots.
208
209* osm_sm_state_mgr.c: In __osm_sm_state_mgr_send_master_sm_info_req,
210  handle master GUID port not found properly
211
212* osm_sa_multipath_record.c: In __osm_mpr_rcv_get_path_parms, return
213  IB_NOT_FOUND rather than IB_ERROR when can't route to LID from switch
214
215* osm_sa_path_record.c: In __osm_pr_rcv_get_path_parms, return IB_NOT_FOUND
216  rather than IB_ERROR when can't route to LID from switch
217
218* osm_vendor_ibumad.c:  In osm_vendor_set_sm, set issmfd to
219  -1 on open error
220
221* osm_vendor_ibumad: Termination crash fix
222  When OpenSM is terminated umad_receiver thread still running even after
223  the structures are destroyed and freed, this causes to random (but easily
224  reproducible) crashes. The reason is that osm_vendor_delete() does not
225  care about thread termination. This patch adds the receiver thread
226  cancellation (by using pthread_cancel() and pthread_join()) and cares to
227  keep have all mutexes unlocked upon termination. There is also minor
228  termination code consolidation - osm_vendor_port_close() function.
229
230* osm_port_profile.h: Fix reinsertion issue in osm_port_prof_set_ignored_port
231
232* osm_matrix.h: Fix segfault with up/down and root nodes file
233
234* osm_sa_path_record.c: In osm_pr_rcv_process, fix endian of hop_limit
235
236* osm_vendor_ibumad.c: Close umad port in osm_vendor_delete
237
238* osm_sa_(multipath path)_record.c: Fix MultiPathRecord/PathRecord issues
239  with using MTU/rate/PktLife explicitly ignoring selectors
240
241  OpenSM just uses the resulting path MTU/rate/pkt-life and fail the
242  query even though the selector might be allowing for selecting an
243  appropriate value.
244
245  After this fix, the following results are obtained for a case of
246  path allowing maximal 2K MTU.
247
248In standard mode:
249------------------------------------------------------------
250MTU greater than ... 256     (0x01) ->  equal to ....... 2K
251MTU less than ...... 256     (0x41) ->  NO PATHS
252MTU equal to ....... 256     (0x81) ->  equal to ....... 256
253MTU largest possible 256     (0xc1) ->  equal to ....... 2K
254MTU greater than ... 512     (0x02) ->  equal to ....... 2K
255MTU less than ...... 512     (0x42) ->  equal to ....... 256
256MTU equal to ....... 512     (0x82) ->  equal to ....... 512
257MTU largest possible 512     (0xc2) ->  equal to ....... 2K
258MTU greater than ... 1K      (0x03) ->  equal to ....... 2K
259MTU less than ...... 1K      (0x43) ->  equal to ....... 512
260MTU equal to ....... 1K      (0x83) ->  equal to ....... 1K
261MTU largest possible 1K      (0xc3) ->  equal to ....... 2K
262MTU greater than ... 2K      (0x04) ->  NO PATHS
263MTU less than ...... 2K      (0x44) ->  equal to ....... 1K
264MTU equal to ....... 2K      (0x84) ->  equal to ....... 2K
265MTU largest possible 2K      (0xc4) ->  equal to ....... 2K
266MTU greater than ... 4K      (0x05) ->  NO PATHS
267MTU less than ...... 4K      (0x45) ->  equal to ....... 2K
268MTU equal to ....... 4K      (0x85) ->  NO PATHS
269MTU largest possible 4K      (0xc5) ->  equal to ....... 2K
270============================================================
271
272With enable_quirks (when one of the ends is a Tavor device):
273------------------------------------------------------------
274MTU greater than ... 256     (0x01) ->  equal to ....... 1K
275MTU less than ...... 256     (0x41) ->  NO PATHS
276MTU equal to ....... 256     (0x81) ->  equal to ....... 256
277MTU largest possible 256     (0xc1) ->  equal to ....... 2K
278MTU greater than ... 512     (0x02) ->  equal to ....... 1K
279MTU less than ...... 512     (0x42) ->  equal to ....... 256
280MTU equal to ....... 512     (0x82) ->  equal to ....... 512
281MTU largest possible 512     (0xc2) ->  equal to ....... 2K
282MTU greater than ... 1K      (0x03) ->  NO PATHS
283MTU less than ...... 1K      (0x43) ->  equal to ....... 512
284MTU equal to ....... 1K      (0x83) ->  equal to ....... 1K
285MTU largest possible 1K      (0xc3) ->  equal to ....... 2K
286MTU greater than ... 2K      (0x04) ->  NO PATHS
287MTU less than ...... 2K      (0x44) ->  equal to ....... 1K
288MTU equal to ....... 2K      (0x84) ->  equal to ....... 2K
289MTU largest possible 2K      (0xc4) ->  equal to ....... 2K
290MTU greater than ... 4K      (0x05) ->  NO PATHS
291MTU less than ...... 4K      (0x45) ->  equal to ....... 1K
292MTU equal to ....... 4K      (0x85) ->  NO PATHS
293MTU largest possible 4K      (0xc5) ->  equal to ....... 2K
294============================================================
295
296* osm_pkey_rcv.c: rwlock double release fix
297  When the port is removed from subnet, but previously requested pkey
298  table block is received after this - the lock will be released twice.
299  This leads to deadlocks later when other MAD processor will try to
300  acquire the same lock.
301
302* osm_sa_informinfo.c: Fix InformInfoRecord searches
303
304* Better SA MCMemberRecord leave locking
305  Hold locked multicast group leave request (MCMember Record) processing.
306  This prevents kind of race with multicast group join request where
307  those requests can be reordered during processing.
308
309* osm_sa_informinfo.c: Conformance changes for subscribe component
310
311* osm_sa_path_record.c: Handle LID 0 as error
312
313* Fix comparing InformInfo records
314  1. The received InformInfo struct was modified before dumping it.
315  2. The function that compares InformInfo structures was just
316     comparing the whole memory allocated for it, including reserved
317     fields. Fixed to compare more selectively.
318
319  As for QPN, from the IB spec, table 119 InformInfo:
320  QPN : Ignored except when subscribe=0 (an unsubscribe
321  request). Queue pair to which Report()s were sent as
322  a result of a corresponding subscription. If no
323  subscription for this Report() with this QPN exists,
324  the request to unsubscribe performs no action and
325  produces GetResp() with status indicating an invalid
326  field value.
327
328* osm_trap_rcv.c: Reduce repeated trap messages so log doesn't fill
329  so quickly
330
331* osm_helper.c: Fix stack smashing detected problem in osm_dump_service_record
332
333* Fix permission on db files directory
334  When creating directory for db files (guid2lid) storing create it with
335  reasonable permissions (current 777 decimal = octal 01411) and don't do
336  it world writable.
337
338* Fix node_desc.description as string usages
339
3405 Main Verification Flows
341-------------------------
342
343OpenSM verification is run using the following activities:
344* osmtest - a stand-alone program
345* ibmgtsim (IB management simulator) based - a set of flows that
346  simulate clusters, inject errors and verify OpenSM capability to
347  respond and bring up the network correctly.
348* small cluster regression testing - where the SM is used on back to
349  back or single switch configurations. The regression includes
350  multiple OpenSM dedicated tests.
351* cluster testing - when we run OpenSM to setup a large cluster, perform
352  hand-off, reboots and reconnects, verify routing correctness and SA
353  responsiveness at the ULP level (IPoIB and SDP).
354
3555.1 osmtest
356
357osmtest is an automated verification tool used for OpenSM
358testing. Its verification flows are described by list below.
359
360* Inventory File: Obtain and verify all port info, node info, link and path
361  records parameters.
362
363* Service Record:
364   - Register new service
365   - Register another service (with a lease period)
366   - Register another service (with service p_key set to zero)
367   - Get all services by name
368   - Delete the first service
369   - Delete the third service
370   - Added bad flows of get/delete  non valid service
371   - Add / Get same service with different data
372   - Add / Get / Delete by different component  mask values (services
373     by Name & Key / Name & Data / Name & Id / Id only )
374
375* Multicast Member Record:
376   - Query of existing Groups (IPoIB)
377   - BAD Join with insufficient comp mask (o15.0.1.3)
378   - Create given MGID=0 (o15.0.1.4)
379   - Create given MGID=0xFF12A01C,FE800000,00000000,12345678 (o15.0.1.4)
380   - Create BAD MGID=0xFA. (o15.0.1.6)
381   - Create BAD MGID=0xFF12A01B w/ link-local not set (o15.0.1.6)
382   - New MGID with invalid join state (o15.0.1.9)
383   - Retry of existing MGID - See JoinState update (o15.0.1.11)
384   - BAD RATE when connecting to existing MGID (o15.0.1.13)
385   - Partial JoinState delete request - removing FullMember (o15.0.1.14)
386   - Full Delete of a group (o15.0.1.14)
387   - Verify Delete by trying to Join deleted group (o15.0.1.14)
388   - BAD Delete of IPoIB membership (no prev join) (o15.0.1.15)
389
390* GUIDInfo Record:
391   - All GUIDInfoRecords in subnet are obtained
392
393* MultiPathRecord:
394   - Perform some compliant and noncompliant MultiPathRecord requests
395   - Validation is via status in responses and IB analyzer
396
397* PKeyTableRecord:
398  - Perform some compliant and noncompliant PKeyTableRecord queries
399  - Validation is via status in responses and IB analyzer
400
401* LinearForwardingTableRecord:
402  - Perform some compliant and noncompliant LinearForwardingTableRecord queries
403  - Validation is via status in responses and IB analyzer
404
405* Event Forwarding: Register for trap forwarding using reports
406   - Send a trap and wait for report
407   - Unregister non-existing
408
409* Trap 64/65 Flow: Register to Trap 64-65, create traps (by
410  disconnecting/connecting ports) and wait for report, then unregister.
411
412* Stress Test: send PortInfoRecord queries, both single and RMPP and
413  check for the rate of responses as well as their validity.
414
415
4165.2 IB Management Simulator OpenSM Test Flows:
417
418The simulator provides ability to simulate the SM handling of virtual
419topologies that are not limited to actual lab equipment availability.
420OpenSM was simulated to bring up clusters of up to 10,000 nodes. Daily
421regressions use smaller (16 and 128 nodes clusters).
422
423The following test flows are run on the IB management simulator:
424
425* Stability:
426  Up to 12 links from the fabric are randomly selected to drop packets
427  at drop rates up to 90%. The SM is required to succeed in bringing the
428  fabric up. The resulting routing is verified to be correct as well.
429
430* LID Manager:
431  Using LMC = 2 the fabric is initialized with LIDs. Faults such as
432  zero LID, Duplicated LID, non-aligned (to LMC) LIDs are
433  randomly assigned to various nodes and other errors are randomly
434  output to the guid2lid cache file. The SM sweep is run 5 times and
435  after each iteration a complete verification is made to ensure that all
436  LIDs that could possibly be maintained are kept, as well as that all nodes
437  were assigned a legal LID range.
438
439* Multicast Routing:
440  Nodes randomly join the 0xc000 group and eventually the
441  resulting routing is verified for completeness and adherence to
442  Up/Down routing rules.
443
444* osmtest:
445  The complete osmtest flow as described in the previous table is run on
446  the simulated fabrics.
447
448* Stress Test:
449  This flow merges fabric, LID and stability issues with continuous
450  PathRecord, ServiceRecord and Multicast Join/Leave activity to
451  stress the SM/SA during continuous sweeps. InformInfo Set/Delete/Get
452  were added to the test such both existing and non existing nodes
453  perform them in random order.
454
4555.3 OpenSM Regression
456
457Using a back-to-back or single switch connection, the following set of
458tests is run nightly on the stacks described in table 2. The included
459tests are:
460
461* Stress Testing: Flood the SA with queries from multiple channel
462  adapters to check the robustness of the entire stack up to the SA.
463
464* Dynamic Changes: Dynamic Topology changes, through randomly
465  dropping SMP packets, used to test OpenSM adaptation to an unstable
466  network & verify DB correctness.
467
468* Trap Injection: This flow injects traps to the SM and verifies that it
469  handles them gracefully.
470
471* SA Query Test: This test exhaustively checks the SA responses to all
472  possible single component mask. To do that the test examines the
473  entire set of records the SA can provide, classifies them by their
474  field values and then selects every field (using component mask and a
475  value) and verifies that the response matches the expected set of records.
476  A random selection using multiple component mask bits is also performed.
477
4785.4 Cluster testing:
479
480Cluster testing is usually run before a distribution release. It
481involves real hardware setups of 16 to 32 nodes (or more if a beta site
482is available). Each test is validated by running all-to-all ping through the IB
483interface. The test procedure includes:
484
485* Cluster bringup
486
487* Hand-off between 2 or 3 SM's while performing:
488  - Node reboots
489  - Switch power cycles (disconnecting the SM's)
490
491* Unresponsive port detection and recovery
492
493* osmtest from multiple nodes
494
495* Trap injection and recovery
496
497
4986 Qualification
499----------------
500
501Table 2 - Qualified IB Stacks
502=============================
503
504Stack                                    | Version
505-----------------------------------------|--------------------------
506OFED                                     |   1.2
507OFED                                     |   1.1
508OFED                                     |   1.0
509OpenIB Gen2 (IBG2 distribution)          |   1.0
510OpenIB Gen1 (IBGD distribution)          |   1.8.0
511VAPI (Mellanox InfiniBand HCA Driver)    |   3.2 and later
512
513Table 3 - Qualified Devices and Corresponding Firmware
514======================================================
515
516Mellanox
517Device  |   FW versions
518--------|-----------------------------------------------------------
519MT43132 |   InfiniScale - fw-43132  5.2.0 (and later)
520MT47396 |   InfiniScale III - fw-47396 0.5.0 (and later)
521MT23108 |   InfiniHost - fw-23108   3.3.2 (and later)
522MT25204 |   InfiniHost III Lx - fw-25204  1.0.1i (and later)
523MT25208 |   InfiniHost III Ex (InfiniHost Mode) - fw-25208  4.6.2 (and later)
524MT25208 |   InfiniHost III Ex (MemFree Mode) - fw-25218  5.0.1 (and later)
525
526QLogic/PathScale
527Device  |   Note
528--------|-----------------------------------------------------------
529iPath   | QHT6040 (PathScale InfiniPath HT-460)
530iPath   | QHT6140 (PathScale InfiniPath HT-465)
531iPath   | QLE6140 (PathScale InfiniPath PE-880)
532
533Note: OpenSM does not run on an IBM Galaxy (eHCA) as it does not expose
534QP0 and QP1. However, it does support it as a device on the subnet.
535
536