xref: /trueos/contrib/ofed/management/opensm/doc/opensm_release_notes-3.1.10.txt (revision 8fe640108653f13042f1b15213769e338aa524f6)
1                        OpenSM Release Notes 3.1.10
2                       =============================
3
4Version: OpenFabrics Enterprise Distribution (OFED) 1.3
5Repo:    git://git.openfabrics.org/~ofed_1_3/management.git (release)
6         git://git.openfabrics.org/~sashak/management.git (development)
7Date:    February 2008
8
91 Overview
10----------
11This document describes the contents of the OpenSM OFED 1.3 release.
12OpenSM is an InfiniBand compliant Subnet Manager and Administration,
13and runs on top of OpenIB. The OpenSM version for this release
14is openib-3.1.10
15
16This document includes the following sections:
171 This Overview section (describing new features and software
18  dependencies)
192 Known Issues And Limitations
203 Unsupported IB compliance statements
214 Major Bug Fixes
225 Main Verification Flows
236 Qualified software stacks and devices
24
251.1 Major New Features
26
27* QoS manager (experimental)
28  This QoS manager implementation is in accordance with IBA QoS Annex.
29  Highly configurable QoS Policy is parsed from OpenSM QoS policy file.
30  Valid QoS parameters will be reported in SA PathRecord and
31  MultiPathRecord. In addition simple QoS levels per ULPs configuration
32  is supported too.
33
34* Performance Manager
35  When enabled it collects a fabric port counters and able to log it or
36  to pass to external program via event plugin interface. It handles
37  counters overflow, supports LID/QP redirection and is able to work
38  when OpenSM is in master, standby, and inactive states.
39
40* Dimension Order routing (DOR) algorithm
41  DOR Unicast routing algorithm - based on the Min Hop algorithm,  but
42  avoids  port  equalization  except for redundant links between the
43  same two switches.  This provides deadlock free routes for hypercubes
44  when the  fabric  is  cabled  as a hypercube and for meshes when cabled
45  as a mesh (see details in OpenSM man page).
46
47* Routing improvements
48  Speedup the current routing algorithms default MinHops, Up/Down and
49  LASH and lid matrix generation. Fat Tree routing engine is able to work
50  with not pure fat free topology.
51
52* Multiple IB routers support
53  OpenSM now able to keep configurable subnet prefix to router table.
54  SA will report path to this routers when SA PathRecord was issued with
55  non-local DGID.
56
57* Node map
58  This is possible to name nodes in this config file. Those names will be
59  used for logging and by QoS configuration.
60
61* PKey index support
62  Proper support for PKey index in GSI queries.
63
64* Incremental LFTs, PKey, SL2VL, and VLarbitration table updates
65  OpenSM will only fetch those tables in first heavy sweep and then
66  will maintain this internally.
67
68* Fast port and switch detector
69  When port and/or switch was externally reset and it was fast so sweep
70  doesn't find this device as disconnected OpenSM will detect this by
71  changed port states and handle accordingly.
72
73* Duplicated GUIDs/port moving detector
74  OpenSM will be able to detect port moving during a fabric discovery
75  and will not report duplicated GUIDs in this case.
76
77* Multicast rerouting speedup
78  Now OpenSM will calculate and setup multicast forwarding tables for
79  all altered multicast groups and not for each one.
80
81* Event plugin API
82  OpenSM allows to load dynamically various plugin modules.
83
84* Many generic improvements
85
861.2 Minor New Features:
87
88* Daemon mode can be activated with -B option.
89
90* Support multiple scopes for IPoIB multicast groups in partition config.
91
92* Loopback connection handling
93  Loopback connection is not interpreted as duplicated GUID anymore.
94
95* Connect root nodes option for Up/Down routing engine.
96  When this option is specified Up/Down will create routing paths between
97  its root nodes.
98
99* Dump and log filenames changed from osm* to opensm*.
100
101* Support loopback console
102  Socket console with only local access.
103
104* Configurable config directory (the default value is /etc/opensm) and
105  configurable default values of OpenSM config filenames.
106
107* Add option for force SDR link speed
108  Add option to opensm.opts to force link speed. Currently, only forcing
109  to SDR link speed is supported. This option is not supported as a
110  command line option.
111
112* Better packaging
113  Building and RPM packaging were improved and simplified.
114
115* Handle "babbling" ports
116  When a babbling port (port which causes a frequent trap generation) is
117  detected, OpenSM will disable the port which should terminate the trap
118  storm.
119
1201.3 Library API Changes
121
122  None
123
1241.4 Software Dependencies
125
126OpenSM depends on the installation of either OFED 1.3, OFED 1.2, OFED 1.1,
127OFED 1.0, OpenIB gen2 (e.g. IBG2 distribution), OpenIB gen1 (e.g. IBGD
128distribution), or Mellanox VAPI stacks. The qualified driver versions
129are provided in Table 2, "Qualified IB Stacks".
130
131Also building of QoS manager policy file parser requires flex, and either
132bison or byacc installed.
133
1341.5 Supported Devices Firmware
135
136The main task of OpenSM is to initialize InfiniBand devices. The
137qualified devices and their corresponding firmware versions
138are listed in Table 3.
139
1402 Known Issues And Limitations
141------------------------------
142
143* No Service / Key associations:
144  There is no way to manage Service access by Keys.
145
146* No SM to SM SMDB synchronization:
147  Puts the burden of re-registering services, multicast groups, and
148  inform-info on the client application (or IB access layer core).
149
1503 Unsupported IB Compliance Statements
151--------------------------------------
152The following section lists all the IB compliance statements which
153OpenSM does not support. Please refer to the IB specification for detailed
154information regarding each compliance statement.
155
156* C14-22 (Authentication):
157  M_Key M_KeyProtectBits and M_KeyLeasePeriod shall be set in one
158  SubnSet method. As a work-around, an OpenSM option is provided for
159  defining the protect bits.
160
161* C14-67 (Authentication):
162  On SubnGet(SMInfo) and SubnSet(SMInfo) - if M_Key is not zero then
163  the SM shall generate a SubnGetResp if the M_Key matches, or
164  silently drop the packet if M_Key does not match.
165
166* C15-0.1.23.4 (Authentication):
167  InformInfoRecords shall always be provided with the QPN set to 0,
168  except for the case of a trusted request, in which case the actual
169  subscriber QPN shall be returned.
170
171* o13-17.1.2 (Event-FWD):
172  If no permission to forward, the subscription should be removed and
173  no further forwarding should occur.
174
175* C14-24.1.1.5 and C14-62.1.1.22 (Initialization):
176  GUIDInfo - SM should enable assigning Port GUIDInfo.
177
178* C14-44 (Initialization):
179  If the SM discovers that it is missing an M_Key to update CA/RT/SW,
180  it should notify the higher level.
181
182* C14-62.1.1.12 (Initialization):
183  PortInfo:M_Key - Set the M_Key to a node based random value.
184
185* C14-62.1.1.13 (Initialization):
186  PortInfo:P_KeyProtectBits - set according to an optional policy.
187
188* C14-62.1.1.24 (Initialization):
189  SwitchInfo:DefaultPort - should be configured for random FDB.
190
191* C14-62.1.1.32 (Initialization):
192  RandomForwardingTable should be configured.
193
194* o15-0.1.12 (Multicast):
195  If the JoinState is SendOnlyNonMember = 1 (only), then the endport
196  should join as sender only.
197
198* o15-0.1.8 (Multicast):
199  If a request for creating an MCG with fields that cannot be met,
200  return ERR_REQ_INVALID (currently ignores SL and FlowLabelTClass).
201
202* C15-0.1.8.6 (SA-Query):
203  Respond to SubnAdmGetTraceTable - this is an optional attribute.
204
205* C15-0.1.13 Services:
206  Reject ServiceRecord create, modify or delete if the given
207  ServiceP_Key does not match the one included in the ServiceGID port
208  and the port that sent the request.
209
210* C15-0.1.14 (Services):
211  Provide means to associate service name and ServiceKeys.
212
2134 Major Bug Fixes
214-----------------
215
216The following is a list of bugs that were fixed. Note that other less critical
217or visible bugs were also fixed.
218
219* osm_ucast_ftree.c: do load-leveling of non-CN routes
220
221* osm_ucast_ftree.c: ignore port 0 and loopbacks on switches
222
223* lash: fix possible segfault in osm_get_lash_sl()
224
225* osm_ucast_ftree.c: fixing coredump in fat-tree routing
226
227* osm_sa_slvl_record: fix overflow crash
228
229* Break multicast rerouting requests processing when heavy sweep is
230  scheduled.
231
232* updn: report fallback properly
233
234* Fix incorrect identification of routing engine used
235
236* Don't zero base LID when invalid value is received
237
238* lash: fix wrong allocation size
239
240* Fixing broken logic in 'process world' part of LinkRecord processing
241
242* Fix lmc_mask bit order in osm_sa_link_record.c
243
244* Adding missing comparison by to_lid/from_lid in LinkRecord processing
245
246* Broken logic when scanning subnet for PIR request
247
248* No interactive games in daemon mode
249
250* Fixing memory leak in node description
251
252* Fix PortInfo update issues for switch port 0
253
254* Changed method_mask type in user_mad interface in accordance with
255  kernel ABI
256
257* Use umad_get_issm_path() in osm_vendor_set_sm()
258
259* Report message fix
260
261* Uninitialized variables usage fix
262
263* osm_ucast_ftree.c: Possible NULL ptr seg fault
264
265* osm_mcast_mgr.c: Possible NULL ptr seg fault
266
267* TrapRepress was failing for mkey != 0
268
269* IB_PR_COMPMASK was used in MPR
270
271* Set hop limit when creating ipoib multicast groups
272
273* Fix outstanding mad counters tracking on the error paths.
274
275* Report new ports before handover mastership
276
277* Fix opvls and neighbormtu when remote port invalid.
278
279* Bug in coding trying to set vl_arb_high_limit when PortInfo.base_lid
280  was still zero.
281
282* Protect SMInfo response against port moving issue.
283
2845 Main Verification Flows
285-------------------------
286
287OpenSM verification is run using the following activities:
288* osmtest - a stand-alone program
289* ibmgtsim (IB management simulator) based - a set of flows that
290  simulate clusters, inject errors and verify OpenSM capability to
291  respond and bring up the network correctly.
292* small cluster regression testing - where the SM is used on back to
293  back or single switch configurations. The regression includes
294  multiple OpenSM dedicated tests.
295* cluster testing - when we run OpenSM to setup a large cluster, perform
296  hand-off, reboots and reconnects, verify routing correctness and SA
297  responsiveness at the ULP level (IPoIB and SDP).
298
2995.1 osmtest
300
301osmtest is an automated verification tool used for OpenSM
302testing. Its verification flows are described by list below.
303
304* Inventory File: Obtain and verify all port info, node info, link and path
305  records parameters.
306
307* Service Record:
308   - Register new service
309   - Register another service (with a lease period)
310   - Register another service (with service p_key set to zero)
311   - Get all services by name
312   - Delete the first service
313   - Delete the third service
314   - Added bad flows of get/delete  non valid service
315   - Add / Get same service with different data
316   - Add / Get / Delete by different component  mask values (services
317     by Name & Key / Name & Data / Name & Id / Id only )
318
319* Multicast Member Record:
320   - Query of existing Groups (IPoIB)
321   - BAD Join with insufficient comp mask (o15.0.1.3)
322   - Create given MGID=0 (o15.0.1.4)
323   - Create given MGID=0xFF12A01C,FE800000,00000000,12345678 (o15.0.1.4)
324   - Create BAD MGID=0xFA. (o15.0.1.6)
325   - Create BAD MGID=0xFF12A01B w/ link-local not set (o15.0.1.6)
326   - New MGID with invalid join state (o15.0.1.9)
327   - Retry of existing MGID - See JoinState update (o15.0.1.11)
328   - BAD RATE when connecting to existing MGID (o15.0.1.13)
329   - Partial JoinState delete request - removing FullMember (o15.0.1.14)
330   - Full Delete of a group (o15.0.1.14)
331   - Verify Delete by trying to Join deleted group (o15.0.1.14)
332   - BAD Delete of IPoIB membership (no prev join) (o15.0.1.15)
333
334* GUIDInfo Record:
335   - All GUIDInfoRecords in subnet are obtained
336
337* MultiPathRecord:
338   - Perform some compliant and noncompliant MultiPathRecord requests
339   - Validation is via status in responses and IB analyzer
340
341* PKeyTableRecord:
342  - Perform some compliant and noncompliant PKeyTableRecord queries
343  - Validation is via status in responses and IB analyzer
344
345* LinearForwardingTableRecord:
346  - Perform some compliant and noncompliant LinearForwardingTableRecord queries
347  - Validation is via status in responses and IB analyzer
348
349* Event Forwarding: Register for trap forwarding using reports
350   - Send a trap and wait for report
351   - Unregister non-existing
352
353* Trap 64/65 Flow: Register to Trap 64-65, create traps (by
354  disconnecting/connecting ports) and wait for report, then unregister.
355
356* Stress Test: send PortInfoRecord queries, both single and RMPP and
357  check for the rate of responses as well as their validity.
358
359
3605.2 IB Management Simulator OpenSM Test Flows:
361
362The simulator provides ability to simulate the SM handling of virtual
363topologies that are not limited to actual lab equipment availability.
364OpenSM was simulated to bring up clusters of up to 10,000 nodes. Daily
365regressions use smaller (16 and 128 nodes clusters).
366
367The following test flows are run on the IB management simulator:
368
369* Stability:
370  Up to 12 links from the fabric are randomly selected to drop packets
371  at drop rates up to 90%. The SM is required to succeed in bringing the
372  fabric up. The resulting routing is verified to be correct as well.
373
374* LID Manager:
375  Using LMC = 2 the fabric is initialized with LIDs. Faults such as
376  zero LID, Duplicated LID, non-aligned (to LMC) LIDs are
377  randomly assigned to various nodes and other errors are randomly
378  output to the guid2lid cache file. The SM sweep is run 5 times and
379  after each iteration a complete verification is made to ensure that all
380  LIDs that could possibly be maintained are kept, as well as that all nodes
381  were assigned a legal LID range.
382
383* Multicast Routing:
384  Nodes randomly join the 0xc000 group and eventually the
385  resulting routing is verified for completeness and adherence to
386  Up/Down routing rules.
387
388* osmtest:
389  The complete osmtest flow as described in the previous table is run on
390  the simulated fabrics.
391
392* Stress Test:
393  This flow merges fabric, LID and stability issues with continuous
394  PathRecord, ServiceRecord and Multicast Join/Leave activity to
395  stress the SM/SA during continuous sweeps. InformInfo Set/Delete/Get
396  were added to the test such both existing and non existing nodes
397  perform them in random order.
398
3995.3 OpenSM Regression
400
401Using a back-to-back or single switch connection, the following set of
402tests is run nightly on the stacks described in table 2. The included
403tests are:
404
405* Stress Testing: Flood the SA with queries from multiple channel
406  adapters to check the robustness of the entire stack up to the SA.
407
408* Dynamic Changes: Dynamic Topology changes, through randomly
409  dropping SMP packets, used to test OpenSM adaptation to an unstable
410  network & verify DB correctness.
411
412* Trap Injection: This flow injects traps to the SM and verifies that it
413  handles them gracefully.
414
415* SA Query Test: This test exhaustively checks the SA responses to all
416  possible single component mask. To do that the test examines the
417  entire set of records the SA can provide, classifies them by their
418  field values and then selects every field (using component mask and a
419  value) and verifies that the response matches the expected set of records.
420  A random selection using multiple component mask bits is also performed.
421
4225.4 Cluster testing:
423
424Cluster testing is usually run before a distribution release. It
425involves real hardware setups of 16 to 32 nodes (or more if a beta site
426is available). Each test is validated by running all-to-all ping through the IB
427interface. The test procedure includes:
428
429* Cluster bringup
430
431* Hand-off between 2 or 3 SM's while performing:
432  - Node reboots
433  - Switch power cycles (disconnecting the SM's)
434
435* Unresponsive port detection and recovery
436
437* osmtest from multiple nodes
438
439* Trap injection and recovery
440
441
4426 Qualification
443----------------
444
445Table 2 - Qualified IB Stacks
446=============================
447
448Stack                                    | Version
449-----------------------------------------|--------------------------
450OFED                                     |   1.3
451OFED                                     |   1.2
452OFED                                     |   1.1
453OFED                                     |   1.0
454OpenIB Gen2 (IBG2 distribution)          |   1.0
455OpenIB Gen1 (IBGD distribution)          |   1.8.0
456VAPI (Mellanox InfiniBand HCA Driver)    |   3.2 and later
457
458Table 3 - Qualified Devices and Corresponding Firmware
459======================================================
460
461Mellanox
462Device                              |   FW versions
463------------------------------------|-------------------------------
464InfiniScale                         | fw-43132  5.2.000 (and later)
465InfiniScale III                     | fw-47396  0.5.000 (and later)
466InfiniHost                          | fw-23108  3.5.000 (and later)
467InfiniHost III Lx                   | fw-25204  1.2.000 (and later)
468InfiniHost III Ex (InfiniHost Mode) | fw-25208  4.8.200 (and later)
469InfiniHost III Ex (MemFree Mode)    | fw-25218  5.3.000 (and later)
470ConnectX IB                         | fw-25408  2.3.000 (and later)
471
472QLogic/PathScale
473Device  |   Note
474--------|-----------------------------------------------------------
475iPath   | QHT6040 (PathScale InfiniPath HT-460)
476iPath   | QHT6140 (PathScale InfiniPath HT-465)
477iPath   | QLE6140 (PathScale InfiniPath PE-880)
478iPath   | QLE7240
479iPath   | QLE7280
480
481Note 1: OpenSM does not run on an IBM Galaxy (eHCA) as it does not expose
482QP0 and QP1. However, it does support it as a device on the subnet.
483
484Note 2: QoS firmware and Mellanox devices
485
486HCAs: QoS supported by ConnectX. The current FW release
487doesn't support QoS. QoS-enabled FW release (2_5_000) is
488planned for May. If someone wishes to get QoS-enabled FW
489before the official release, they should contact Mellanox FAE.
490
491Switches: QoS supported by InfiniScale III
492Any InfiniScale III FW that is supported by OpenSM supports QoS.
493