1 OpenSM Release Notes 3.1.10 2 ============================= 3 4Version: OpenFabrics Enterprise Distribution (OFED) 1.3 5Repo: git://git.openfabrics.org/~ofed_1_3/management.git (release) 6 git://git.openfabrics.org/~sashak/management.git (development) 7Date: February 2008 8 91 Overview 10---------- 11This document describes the contents of the OpenSM OFED 1.3 release. 12OpenSM is an InfiniBand compliant Subnet Manager and Administration, 13and runs on top of OpenIB. The OpenSM version for this release 14is openib-3.1.10 15 16This document includes the following sections: 171 This Overview section (describing new features and software 18 dependencies) 192 Known Issues And Limitations 203 Unsupported IB compliance statements 214 Major Bug Fixes 225 Main Verification Flows 236 Qualified software stacks and devices 24 251.1 Major New Features 26 27* QoS manager (experimental) 28 This QoS manager implementation is in accordance with IBA QoS Annex. 29 Highly configurable QoS Policy is parsed from OpenSM QoS policy file. 30 Valid QoS parameters will be reported in SA PathRecord and 31 MultiPathRecord. In addition simple QoS levels per ULPs configuration 32 is supported too. 33 34* Performance Manager 35 When enabled it collects a fabric port counters and able to log it or 36 to pass to external program via event plugin interface. It handles 37 counters overflow, supports LID/QP redirection and is able to work 38 when OpenSM is in master, standby, and inactive states. 39 40* Dimension Order routing (DOR) algorithm 41 DOR Unicast routing algorithm - based on the Min Hop algorithm, but 42 avoids port equalization except for redundant links between the 43 same two switches. This provides deadlock free routes for hypercubes 44 when the fabric is cabled as a hypercube and for meshes when cabled 45 as a mesh (see details in OpenSM man page). 46 47* Routing improvements 48 Speedup the current routing algorithms default MinHops, Up/Down and 49 LASH and lid matrix generation. Fat Tree routing engine is able to work 50 with not pure fat free topology. 51 52* Multiple IB routers support 53 OpenSM now able to keep configurable subnet prefix to router table. 54 SA will report path to this routers when SA PathRecord was issued with 55 non-local DGID. 56 57* Node map 58 This is possible to name nodes in this config file. Those names will be 59 used for logging and by QoS configuration. 60 61* PKey index support 62 Proper support for PKey index in GSI queries. 63 64* Incremental LFTs, PKey, SL2VL, and VLarbitration table updates 65 OpenSM will only fetch those tables in first heavy sweep and then 66 will maintain this internally. 67 68* Fast port and switch detector 69 When port and/or switch was externally reset and it was fast so sweep 70 doesn't find this device as disconnected OpenSM will detect this by 71 changed port states and handle accordingly. 72 73* Duplicated GUIDs/port moving detector 74 OpenSM will be able to detect port moving during a fabric discovery 75 and will not report duplicated GUIDs in this case. 76 77* Multicast rerouting speedup 78 Now OpenSM will calculate and setup multicast forwarding tables for 79 all altered multicast groups and not for each one. 80 81* Event plugin API 82 OpenSM allows to load dynamically various plugin modules. 83 84* Many generic improvements 85 861.2 Minor New Features: 87 88* Daemon mode can be activated with -B option. 89 90* Support multiple scopes for IPoIB multicast groups in partition config. 91 92* Loopback connection handling 93 Loopback connection is not interpreted as duplicated GUID anymore. 94 95* Connect root nodes option for Up/Down routing engine. 96 When this option is specified Up/Down will create routing paths between 97 its root nodes. 98 99* Dump and log filenames changed from osm* to opensm*. 100 101* Support loopback console 102 Socket console with only local access. 103 104* Configurable config directory (the default value is /etc/opensm) and 105 configurable default values of OpenSM config filenames. 106 107* Add option for force SDR link speed 108 Add option to opensm.opts to force link speed. Currently, only forcing 109 to SDR link speed is supported. This option is not supported as a 110 command line option. 111 112* Better packaging 113 Building and RPM packaging were improved and simplified. 114 115* Handle "babbling" ports 116 When a babbling port (port which causes a frequent trap generation) is 117 detected, OpenSM will disable the port which should terminate the trap 118 storm. 119 1201.3 Library API Changes 121 122 None 123 1241.4 Software Dependencies 125 126OpenSM depends on the installation of either OFED 1.3, OFED 1.2, OFED 1.1, 127OFED 1.0, OpenIB gen2 (e.g. IBG2 distribution), OpenIB gen1 (e.g. IBGD 128distribution), or Mellanox VAPI stacks. The qualified driver versions 129are provided in Table 2, "Qualified IB Stacks". 130 131Also building of QoS manager policy file parser requires flex, and either 132bison or byacc installed. 133 1341.5 Supported Devices Firmware 135 136The main task of OpenSM is to initialize InfiniBand devices. The 137qualified devices and their corresponding firmware versions 138are listed in Table 3. 139 1402 Known Issues And Limitations 141------------------------------ 142 143* No Service / Key associations: 144 There is no way to manage Service access by Keys. 145 146* No SM to SM SMDB synchronization: 147 Puts the burden of re-registering services, multicast groups, and 148 inform-info on the client application (or IB access layer core). 149 1503 Unsupported IB Compliance Statements 151-------------------------------------- 152The following section lists all the IB compliance statements which 153OpenSM does not support. Please refer to the IB specification for detailed 154information regarding each compliance statement. 155 156* C14-22 (Authentication): 157 M_Key M_KeyProtectBits and M_KeyLeasePeriod shall be set in one 158 SubnSet method. As a work-around, an OpenSM option is provided for 159 defining the protect bits. 160 161* C14-67 (Authentication): 162 On SubnGet(SMInfo) and SubnSet(SMInfo) - if M_Key is not zero then 163 the SM shall generate a SubnGetResp if the M_Key matches, or 164 silently drop the packet if M_Key does not match. 165 166* C15-0.1.23.4 (Authentication): 167 InformInfoRecords shall always be provided with the QPN set to 0, 168 except for the case of a trusted request, in which case the actual 169 subscriber QPN shall be returned. 170 171* o13-17.1.2 (Event-FWD): 172 If no permission to forward, the subscription should be removed and 173 no further forwarding should occur. 174 175* C14-24.1.1.5 and C14-62.1.1.22 (Initialization): 176 GUIDInfo - SM should enable assigning Port GUIDInfo. 177 178* C14-44 (Initialization): 179 If the SM discovers that it is missing an M_Key to update CA/RT/SW, 180 it should notify the higher level. 181 182* C14-62.1.1.12 (Initialization): 183 PortInfo:M_Key - Set the M_Key to a node based random value. 184 185* C14-62.1.1.13 (Initialization): 186 PortInfo:P_KeyProtectBits - set according to an optional policy. 187 188* C14-62.1.1.24 (Initialization): 189 SwitchInfo:DefaultPort - should be configured for random FDB. 190 191* C14-62.1.1.32 (Initialization): 192 RandomForwardingTable should be configured. 193 194* o15-0.1.12 (Multicast): 195 If the JoinState is SendOnlyNonMember = 1 (only), then the endport 196 should join as sender only. 197 198* o15-0.1.8 (Multicast): 199 If a request for creating an MCG with fields that cannot be met, 200 return ERR_REQ_INVALID (currently ignores SL and FlowLabelTClass). 201 202* C15-0.1.8.6 (SA-Query): 203 Respond to SubnAdmGetTraceTable - this is an optional attribute. 204 205* C15-0.1.13 Services: 206 Reject ServiceRecord create, modify or delete if the given 207 ServiceP_Key does not match the one included in the ServiceGID port 208 and the port that sent the request. 209 210* C15-0.1.14 (Services): 211 Provide means to associate service name and ServiceKeys. 212 2134 Major Bug Fixes 214----------------- 215 216The following is a list of bugs that were fixed. Note that other less critical 217or visible bugs were also fixed. 218 219* osm_ucast_ftree.c: do load-leveling of non-CN routes 220 221* osm_ucast_ftree.c: ignore port 0 and loopbacks on switches 222 223* lash: fix possible segfault in osm_get_lash_sl() 224 225* osm_ucast_ftree.c: fixing coredump in fat-tree routing 226 227* osm_sa_slvl_record: fix overflow crash 228 229* Break multicast rerouting requests processing when heavy sweep is 230 scheduled. 231 232* updn: report fallback properly 233 234* Fix incorrect identification of routing engine used 235 236* Don't zero base LID when invalid value is received 237 238* lash: fix wrong allocation size 239 240* Fixing broken logic in 'process world' part of LinkRecord processing 241 242* Fix lmc_mask bit order in osm_sa_link_record.c 243 244* Adding missing comparison by to_lid/from_lid in LinkRecord processing 245 246* Broken logic when scanning subnet for PIR request 247 248* No interactive games in daemon mode 249 250* Fixing memory leak in node description 251 252* Fix PortInfo update issues for switch port 0 253 254* Changed method_mask type in user_mad interface in accordance with 255 kernel ABI 256 257* Use umad_get_issm_path() in osm_vendor_set_sm() 258 259* Report message fix 260 261* Uninitialized variables usage fix 262 263* osm_ucast_ftree.c: Possible NULL ptr seg fault 264 265* osm_mcast_mgr.c: Possible NULL ptr seg fault 266 267* TrapRepress was failing for mkey != 0 268 269* IB_PR_COMPMASK was used in MPR 270 271* Set hop limit when creating ipoib multicast groups 272 273* Fix outstanding mad counters tracking on the error paths. 274 275* Report new ports before handover mastership 276 277* Fix opvls and neighbormtu when remote port invalid. 278 279* Bug in coding trying to set vl_arb_high_limit when PortInfo.base_lid 280 was still zero. 281 282* Protect SMInfo response against port moving issue. 283 2845 Main Verification Flows 285------------------------- 286 287OpenSM verification is run using the following activities: 288* osmtest - a stand-alone program 289* ibmgtsim (IB management simulator) based - a set of flows that 290 simulate clusters, inject errors and verify OpenSM capability to 291 respond and bring up the network correctly. 292* small cluster regression testing - where the SM is used on back to 293 back or single switch configurations. The regression includes 294 multiple OpenSM dedicated tests. 295* cluster testing - when we run OpenSM to setup a large cluster, perform 296 hand-off, reboots and reconnects, verify routing correctness and SA 297 responsiveness at the ULP level (IPoIB and SDP). 298 2995.1 osmtest 300 301osmtest is an automated verification tool used for OpenSM 302testing. Its verification flows are described by list below. 303 304* Inventory File: Obtain and verify all port info, node info, link and path 305 records parameters. 306 307* Service Record: 308 - Register new service 309 - Register another service (with a lease period) 310 - Register another service (with service p_key set to zero) 311 - Get all services by name 312 - Delete the first service 313 - Delete the third service 314 - Added bad flows of get/delete non valid service 315 - Add / Get same service with different data 316 - Add / Get / Delete by different component mask values (services 317 by Name & Key / Name & Data / Name & Id / Id only ) 318 319* Multicast Member Record: 320 - Query of existing Groups (IPoIB) 321 - BAD Join with insufficient comp mask (o15.0.1.3) 322 - Create given MGID=0 (o15.0.1.4) 323 - Create given MGID=0xFF12A01C,FE800000,00000000,12345678 (o15.0.1.4) 324 - Create BAD MGID=0xFA. (o15.0.1.6) 325 - Create BAD MGID=0xFF12A01B w/ link-local not set (o15.0.1.6) 326 - New MGID with invalid join state (o15.0.1.9) 327 - Retry of existing MGID - See JoinState update (o15.0.1.11) 328 - BAD RATE when connecting to existing MGID (o15.0.1.13) 329 - Partial JoinState delete request - removing FullMember (o15.0.1.14) 330 - Full Delete of a group (o15.0.1.14) 331 - Verify Delete by trying to Join deleted group (o15.0.1.14) 332 - BAD Delete of IPoIB membership (no prev join) (o15.0.1.15) 333 334* GUIDInfo Record: 335 - All GUIDInfoRecords in subnet are obtained 336 337* MultiPathRecord: 338 - Perform some compliant and noncompliant MultiPathRecord requests 339 - Validation is via status in responses and IB analyzer 340 341* PKeyTableRecord: 342 - Perform some compliant and noncompliant PKeyTableRecord queries 343 - Validation is via status in responses and IB analyzer 344 345* LinearForwardingTableRecord: 346 - Perform some compliant and noncompliant LinearForwardingTableRecord queries 347 - Validation is via status in responses and IB analyzer 348 349* Event Forwarding: Register for trap forwarding using reports 350 - Send a trap and wait for report 351 - Unregister non-existing 352 353* Trap 64/65 Flow: Register to Trap 64-65, create traps (by 354 disconnecting/connecting ports) and wait for report, then unregister. 355 356* Stress Test: send PortInfoRecord queries, both single and RMPP and 357 check for the rate of responses as well as their validity. 358 359 3605.2 IB Management Simulator OpenSM Test Flows: 361 362The simulator provides ability to simulate the SM handling of virtual 363topologies that are not limited to actual lab equipment availability. 364OpenSM was simulated to bring up clusters of up to 10,000 nodes. Daily 365regressions use smaller (16 and 128 nodes clusters). 366 367The following test flows are run on the IB management simulator: 368 369* Stability: 370 Up to 12 links from the fabric are randomly selected to drop packets 371 at drop rates up to 90%. The SM is required to succeed in bringing the 372 fabric up. The resulting routing is verified to be correct as well. 373 374* LID Manager: 375 Using LMC = 2 the fabric is initialized with LIDs. Faults such as 376 zero LID, Duplicated LID, non-aligned (to LMC) LIDs are 377 randomly assigned to various nodes and other errors are randomly 378 output to the guid2lid cache file. The SM sweep is run 5 times and 379 after each iteration a complete verification is made to ensure that all 380 LIDs that could possibly be maintained are kept, as well as that all nodes 381 were assigned a legal LID range. 382 383* Multicast Routing: 384 Nodes randomly join the 0xc000 group and eventually the 385 resulting routing is verified for completeness and adherence to 386 Up/Down routing rules. 387 388* osmtest: 389 The complete osmtest flow as described in the previous table is run on 390 the simulated fabrics. 391 392* Stress Test: 393 This flow merges fabric, LID and stability issues with continuous 394 PathRecord, ServiceRecord and Multicast Join/Leave activity to 395 stress the SM/SA during continuous sweeps. InformInfo Set/Delete/Get 396 were added to the test such both existing and non existing nodes 397 perform them in random order. 398 3995.3 OpenSM Regression 400 401Using a back-to-back or single switch connection, the following set of 402tests is run nightly on the stacks described in table 2. The included 403tests are: 404 405* Stress Testing: Flood the SA with queries from multiple channel 406 adapters to check the robustness of the entire stack up to the SA. 407 408* Dynamic Changes: Dynamic Topology changes, through randomly 409 dropping SMP packets, used to test OpenSM adaptation to an unstable 410 network & verify DB correctness. 411 412* Trap Injection: This flow injects traps to the SM and verifies that it 413 handles them gracefully. 414 415* SA Query Test: This test exhaustively checks the SA responses to all 416 possible single component mask. To do that the test examines the 417 entire set of records the SA can provide, classifies them by their 418 field values and then selects every field (using component mask and a 419 value) and verifies that the response matches the expected set of records. 420 A random selection using multiple component mask bits is also performed. 421 4225.4 Cluster testing: 423 424Cluster testing is usually run before a distribution release. It 425involves real hardware setups of 16 to 32 nodes (or more if a beta site 426is available). Each test is validated by running all-to-all ping through the IB 427interface. The test procedure includes: 428 429* Cluster bringup 430 431* Hand-off between 2 or 3 SM's while performing: 432 - Node reboots 433 - Switch power cycles (disconnecting the SM's) 434 435* Unresponsive port detection and recovery 436 437* osmtest from multiple nodes 438 439* Trap injection and recovery 440 441 4426 Qualification 443---------------- 444 445Table 2 - Qualified IB Stacks 446============================= 447 448Stack | Version 449-----------------------------------------|-------------------------- 450OFED | 1.3 451OFED | 1.2 452OFED | 1.1 453OFED | 1.0 454OpenIB Gen2 (IBG2 distribution) | 1.0 455OpenIB Gen1 (IBGD distribution) | 1.8.0 456VAPI (Mellanox InfiniBand HCA Driver) | 3.2 and later 457 458Table 3 - Qualified Devices and Corresponding Firmware 459====================================================== 460 461Mellanox 462Device | FW versions 463------------------------------------|------------------------------- 464InfiniScale | fw-43132 5.2.000 (and later) 465InfiniScale III | fw-47396 0.5.000 (and later) 466InfiniHost | fw-23108 3.5.000 (and later) 467InfiniHost III Lx | fw-25204 1.2.000 (and later) 468InfiniHost III Ex (InfiniHost Mode) | fw-25208 4.8.200 (and later) 469InfiniHost III Ex (MemFree Mode) | fw-25218 5.3.000 (and later) 470ConnectX IB | fw-25408 2.3.000 (and later) 471 472QLogic/PathScale 473Device | Note 474--------|----------------------------------------------------------- 475iPath | QHT6040 (PathScale InfiniPath HT-460) 476iPath | QHT6140 (PathScale InfiniPath HT-465) 477iPath | QLE6140 (PathScale InfiniPath PE-880) 478iPath | QLE7240 479iPath | QLE7280 480 481Note 1: OpenSM does not run on an IBM Galaxy (eHCA) as it does not expose 482QP0 and QP1. However, it does support it as a device on the subnet. 483 484Note 2: QoS firmware and Mellanox devices 485 486HCAs: QoS supported by ConnectX. The current FW release 487doesn't support QoS. QoS-enabled FW release (2_5_000) is 488planned for May. If someone wishes to get QoS-enabled FW 489before the official release, they should contact Mellanox FAE. 490 491Switches: QoS supported by InfiniScale III 492Any InfiniScale III FW that is supported by OpenSM supports QoS. 493