1 OpenSM Release Notes 3.0.13 2 ============================= 3 4Version: OpenFabrics Enterprise Distribution (OFED) 1.2 5Repo: git://git.openfabrics.org/~ofed_1_2/management.git (release) 6 git://git.openfabrics.org/~halr/management.git (development) 7Date: June 2007 8 91 Overview 10---------- 11This document describes the contents of the OpenSM OFED 1.2 release. 12OpenSM is an InfiniBand compliant Subnet Manager and Administration, 13and runs on top of OpenIB. The OpenSM version for this release 14is openib-3.0.13 15 16This document includes the following sections: 171 This Overview section (describing new features and software 18 dependencies) 192 Known Issues And Limitations 203 Unsupported IB compliance statements 214 Major Bug Fixes 225 Main Verification Flows 236 Qualified software stacks and devices 24 251.1 Major New Features 26 27* Routing improvements 28 Two additional routing algorithms have been added in addition to 29 performance improvements to the existing routing algorithms. The 30 two new routing algorithms are FAT tree and LASH. See the 31 opensm man page for additional details. 32 33* SA Optional Record support now "virtually" complete 34 Includes SA InformInfo improvements and InformInfoRecord support in 35 addition to support for the remaining SA optional records 36 (MulticastForwardingTableRecord, SwitchInfoRecord). Also, SMInfoRecord 37 support was improved to include all SMs found. 38 39* SA database dump/restore 40 OpenSM now includes the ability to dump and restore the SA database. 41 This allows for all SA registrations (multicast, services, and events) 42 to be saved and restored later. 43 44 In verbose mode, OpenSM will dump SA DB (existing multicast groups, 45 services and InformInfo) into dump file which named "opensm-sa.dump" 46 and located under standard OpenSM dump directory (/var/log by default). 47 48 If option -S is specified and SA DB dump file name is provided, OpenSM 49 will try to restore SA database from this file. And if restore is 50 successful, OpenSM won't ask for client reregistration at subnet bring-up. 51 52* Modular routing for multicast 53 In conjunction was SA database dump/restore, there is the ability to 54 dump and load switch lid matrices (min hops tables) which are used 55 for multicast route calculation. 56 57* IB router enablement 58 OpenSM now supports router ports properly (in terms of PortInfo handling). 59 There is also some experimental support for IB routers which is enabled 60 via the ROUTER_EXP compile flag. This support includes SA PathRecord and 61 MCMemberRecord support for off subnet GIDs. 62 63* Socket support added to console 64 OpenSM console now supports remote in addition to local access. 65 Remote access is currently via telnet. 66 671.2 Minor New Features: 68 69* Change output format of DR path from hex to decimal port numbers 70 71* Log rotation 72 The OpenSM log can now be rotated while OpenSM is running (without 73 stopping and restarting OpenSM). This is accomplished via SIGUSR1. 74 75* Support scope for IPoIB multicast groups in partition config 76 77* Dump filename changed from subnet.lst to osm-subnet.lst 78 Default temp directory for non Windows platforms was previously changed 79 from /tmp to /var/log. 80 81* Add option for force SDR link speed 82 Add option to opensm.opts to force link speed. Currently, only forcing 83 to SDR link speed is supported. This option is not supported as a 84 command line option. 85 861.3 Library API Changes 87 88 None 89 901.4 Software Dependencies 91 92OpenSM depends on the installation of either OFED 1.2, OFED 1.1, 93OFED 1.0, OpenIB gen2 (e.g. IBG2 distribution), OpenIB gen1 (e.g. IBGD 94distribution), or Mellanox VAPI stacks. The qualified driver versions 95are provided in Table 2, "Qualified IB Stacks". 96 971.5 Supported Devices Firmware 98 99The main task of OpenSM is to initialize InfiniBand devices. The 100qualified devices and their corresponding firmware versions 101are listed in Table 3. 102 1032 Known Issues And Limitations 104------------------------------ 105 106* No Service / Key associations: 107 There is no way to manage Service access by Keys. 108 109* No SM to SM SMDB synchronization: 110 Puts the burden of re-registering services, multicast groups, and 111 inform-info on the client application (or IB access layer core). 112 113* No "port down" event handling: 114 Changing the switch port through which OpenSM connects to the IB 115 fabric may cause incorrect operation. Please restart OpenSM whenever 116 such a connectivity change is made. 117 118* Changing connections during SM operation: 119 Under some conditions the SM can get confused by a change in 120 cabling (moving a cable from one switch port to the other) and 121 momentarily see this as having the same GUID appear connected 122 to two different IB ports. Under some conditions, when the SM fails to 123 get the corresponding change event it might mistakenly report this case 124 as a "duplicated GUID" case and abort. It is advisable to double-check 125 the syslog after each such change in connectivity and restart 126 OpenSM if it has exited. The same error ("duplicated GUID") will 127 also appear with a loopback plug. 128 1293 Unsupported IB Compliance Statements 130-------------------------------------- 131The following section lists all the IB compliance statements which 132OpenSM does not support. Please refer to the IB specification for detailed 133information regarding each compliance statement. 134 135* C14-22 (Authentication): 136 M_Key M_KeyProtectBits and M_KeyLeasePeriod shall be set in one 137 SubnSet method. As a work-around, an OpenSM option is provided for 138 defining the protect bits. 139 140* C14-67 (Authentication): 141 On SubnGet(SMInfo) and SubnSet(SMInfo) - if M_Key is not zero then 142 the SM shall generate a SubnGetResp if the M_Key matches, or 143 silently drop the packet if M_Key does not match. 144 145* C15-0.1.23.4 (Authentication): 146 InformInfoRecords shall always be provided with the QPN set to 0, 147 except for the case of a trusted request, in which case the actual 148 subscriber QPN shall be returned. 149 150* o13-17.1.2 (Event-FWD): 151 If no permission to forward, the subscription should be removed and 152 no further forwarding should occur. 153 154* C14-24.1.1.5 and C14-62.1.1.22 (Initialization): 155 GUIDInfo - SM should enable assigning Port GUIDInfo. 156 157* C14-44 (Initialization): 158 If the SM discovers that it is missing an M_Key to update CA/RT/SW, 159 it should notify the higher level. 160 161* C14-62.1.1.12 (Initialization): 162 PortInfo:M_Key - Set the M_Key to a node based random value. 163 164* C14-62.1.1.13 (Initialization): 165 PortInfo:P_KeyProtectBits - set according to an optional policy. 166 167* C14-62.1.1.24 (Initialization): 168 SwitchInfo:DefaultPort - should be configured for random FDB. 169 170* C14-62.1.1.32 (Initialization): 171 RandomForwardingTable should be configured. 172 173* o15-0.1.12 (Multicast): 174 If the JoinState is SendOnlyNonMember = 1 (only), then the endport 175 should join as sender only. 176 177* o15-0.1.8 (Multicast): 178 If a request for creating an MCG with fields that cannot be met, 179 return ERR_REQ_INVALID (currently ignores SL and FlowLabelTClass). 180 181* C15-0.1.8.6 (SA-Query): 182 Respond to SubnAdmGetTraceTable - this is an optional attribute. 183 184* C15-0.1.13 Services: 185 Reject ServiceRecord create, modify or delete if the given 186 ServiceP_Key does not match the one included in the ServiceGID port 187 and the port that sent the request. 188 189* C15-0.1.14 (Services): 190 Provide means to associate service name and ServiceKeys. 191 1924 Major Bug Fixes 193----------------- 194 195The following is a list of bugs that were fixed. Note that other less critical 196or visible bugs were also fixed. 197 198* osm_sminfo_rcv.c: Add SMInfo self query check. OpenSM can query 199 itself for SMInfo occassionally due to port moving during subnet 200 discovery process. Don't create remote SM entry in this case to 201 prevent deadlocks. 202 203* osm_ucast_updn.c: Two similar bugs in up/down routing fixed. 204 8-bit integers were used as indexes when scanning subnet, which 205 in one case caused OpenSM to crash when ranking "path" is longer 206 than 256 switches, and in the other case, caused OpenSM to go into 207 an infinite loop when fabric has more than 256 roots. 208 209* osm_sm_state_mgr.c: In __osm_sm_state_mgr_send_master_sm_info_req, 210 handle master GUID port not found properly 211 212* osm_sa_multipath_record.c: In __osm_mpr_rcv_get_path_parms, return 213 IB_NOT_FOUND rather than IB_ERROR when can't route to LID from switch 214 215* osm_sa_path_record.c: In __osm_pr_rcv_get_path_parms, return IB_NOT_FOUND 216 rather than IB_ERROR when can't route to LID from switch 217 218* osm_vendor_ibumad.c: In osm_vendor_set_sm, set issmfd to 219 -1 on open error 220 221* osm_vendor_ibumad: Termination crash fix 222 When OpenSM is terminated umad_receiver thread still running even after 223 the structures are destroyed and freed, this causes to random (but easily 224 reproducible) crashes. The reason is that osm_vendor_delete() does not 225 care about thread termination. This patch adds the receiver thread 226 cancellation (by using pthread_cancel() and pthread_join()) and cares to 227 keep have all mutexes unlocked upon termination. There is also minor 228 termination code consolidation - osm_vendor_port_close() function. 229 230* osm_port_profile.h: Fix reinsertion issue in osm_port_prof_set_ignored_port 231 232* osm_matrix.h: Fix segfault with up/down and root nodes file 233 234* osm_sa_path_record.c: In osm_pr_rcv_process, fix endian of hop_limit 235 236* osm_vendor_ibumad.c: Close umad port in osm_vendor_delete 237 238* osm_sa_(multipath path)_record.c: Fix MultiPathRecord/PathRecord issues 239 with using MTU/rate/PktLife explicitly ignoring selectors 240 241 OpenSM just uses the resulting path MTU/rate/pkt-life and fail the 242 query even though the selector might be allowing for selecting an 243 appropriate value. 244 245 After this fix, the following results are obtained for a case of 246 path allowing maximal 2K MTU. 247 248In standard mode: 249------------------------------------------------------------ 250MTU greater than ... 256 (0x01) -> equal to ....... 2K 251MTU less than ...... 256 (0x41) -> NO PATHS 252MTU equal to ....... 256 (0x81) -> equal to ....... 256 253MTU largest possible 256 (0xc1) -> equal to ....... 2K 254MTU greater than ... 512 (0x02) -> equal to ....... 2K 255MTU less than ...... 512 (0x42) -> equal to ....... 256 256MTU equal to ....... 512 (0x82) -> equal to ....... 512 257MTU largest possible 512 (0xc2) -> equal to ....... 2K 258MTU greater than ... 1K (0x03) -> equal to ....... 2K 259MTU less than ...... 1K (0x43) -> equal to ....... 512 260MTU equal to ....... 1K (0x83) -> equal to ....... 1K 261MTU largest possible 1K (0xc3) -> equal to ....... 2K 262MTU greater than ... 2K (0x04) -> NO PATHS 263MTU less than ...... 2K (0x44) -> equal to ....... 1K 264MTU equal to ....... 2K (0x84) -> equal to ....... 2K 265MTU largest possible 2K (0xc4) -> equal to ....... 2K 266MTU greater than ... 4K (0x05) -> NO PATHS 267MTU less than ...... 4K (0x45) -> equal to ....... 2K 268MTU equal to ....... 4K (0x85) -> NO PATHS 269MTU largest possible 4K (0xc5) -> equal to ....... 2K 270============================================================ 271 272With enable_quirks (when one of the ends is a Tavor device): 273------------------------------------------------------------ 274MTU greater than ... 256 (0x01) -> equal to ....... 1K 275MTU less than ...... 256 (0x41) -> NO PATHS 276MTU equal to ....... 256 (0x81) -> equal to ....... 256 277MTU largest possible 256 (0xc1) -> equal to ....... 2K 278MTU greater than ... 512 (0x02) -> equal to ....... 1K 279MTU less than ...... 512 (0x42) -> equal to ....... 256 280MTU equal to ....... 512 (0x82) -> equal to ....... 512 281MTU largest possible 512 (0xc2) -> equal to ....... 2K 282MTU greater than ... 1K (0x03) -> NO PATHS 283MTU less than ...... 1K (0x43) -> equal to ....... 512 284MTU equal to ....... 1K (0x83) -> equal to ....... 1K 285MTU largest possible 1K (0xc3) -> equal to ....... 2K 286MTU greater than ... 2K (0x04) -> NO PATHS 287MTU less than ...... 2K (0x44) -> equal to ....... 1K 288MTU equal to ....... 2K (0x84) -> equal to ....... 2K 289MTU largest possible 2K (0xc4) -> equal to ....... 2K 290MTU greater than ... 4K (0x05) -> NO PATHS 291MTU less than ...... 4K (0x45) -> equal to ....... 1K 292MTU equal to ....... 4K (0x85) -> NO PATHS 293MTU largest possible 4K (0xc5) -> equal to ....... 2K 294============================================================ 295 296* osm_pkey_rcv.c: rwlock double release fix 297 When the port is removed from subnet, but previously requested pkey 298 table block is received after this - the lock will be released twice. 299 This leads to deadlocks later when other MAD processor will try to 300 acquire the same lock. 301 302* osm_sa_informinfo.c: Fix InformInfoRecord searches 303 304* Better SA MCMemberRecord leave locking 305 Hold locked multicast group leave request (MCMember Record) processing. 306 This prevents kind of race with multicast group join request where 307 those requests can be reordered during processing. 308 309* osm_sa_informinfo.c: Conformance changes for subscribe component 310 311* osm_sa_path_record.c: Handle LID 0 as error 312 313* Fix comparing InformInfo records 314 1. The received InformInfo struct was modified before dumping it. 315 2. The function that compares InformInfo structures was just 316 comparing the whole memory allocated for it, including reserved 317 fields. Fixed to compare more selectively. 318 319 As for QPN, from the IB spec, table 119 InformInfo: 320 QPN : Ignored except when subscribe=0 (an unsubscribe 321 request). Queue pair to which Report()s were sent as 322 a result of a corresponding subscription. If no 323 subscription for this Report() with this QPN exists, 324 the request to unsubscribe performs no action and 325 produces GetResp() with status indicating an invalid 326 field value. 327 328* osm_trap_rcv.c: Reduce repeated trap messages so log doesn't fill 329 so quickly 330 331* osm_helper.c: Fix stack smashing detected problem in osm_dump_service_record 332 333* Fix permission on db files directory 334 When creating directory for db files (guid2lid) storing create it with 335 reasonable permissions (current 777 decimal = octal 01411) and don't do 336 it world writable. 337 338* Fix node_desc.description as string usages 339 3405 Main Verification Flows 341------------------------- 342 343OpenSM verification is run using the following activities: 344* osmtest - a stand-alone program 345* ibmgtsim (IB management simulator) based - a set of flows that 346 simulate clusters, inject errors and verify OpenSM capability to 347 respond and bring up the network correctly. 348* small cluster regression testing - where the SM is used on back to 349 back or single switch configurations. The regression includes 350 multiple OpenSM dedicated tests. 351* cluster testing - when we run OpenSM to setup a large cluster, perform 352 hand-off, reboots and reconnects, verify routing correctness and SA 353 responsiveness at the ULP level (IPoIB and SDP). 354 3555.1 osmtest 356 357osmtest is an automated verification tool used for OpenSM 358testing. Its verification flows are described by list below. 359 360* Inventory File: Obtain and verify all port info, node info, link and path 361 records parameters. 362 363* Service Record: 364 - Register new service 365 - Register another service (with a lease period) 366 - Register another service (with service p_key set to zero) 367 - Get all services by name 368 - Delete the first service 369 - Delete the third service 370 - Added bad flows of get/delete non valid service 371 - Add / Get same service with different data 372 - Add / Get / Delete by different component mask values (services 373 by Name & Key / Name & Data / Name & Id / Id only ) 374 375* Multicast Member Record: 376 - Query of existing Groups (IPoIB) 377 - BAD Join with insufficient comp mask (o15.0.1.3) 378 - Create given MGID=0 (o15.0.1.4) 379 - Create given MGID=0xFF12A01C,FE800000,00000000,12345678 (o15.0.1.4) 380 - Create BAD MGID=0xFA. (o15.0.1.6) 381 - Create BAD MGID=0xFF12A01B w/ link-local not set (o15.0.1.6) 382 - New MGID with invalid join state (o15.0.1.9) 383 - Retry of existing MGID - See JoinState update (o15.0.1.11) 384 - BAD RATE when connecting to existing MGID (o15.0.1.13) 385 - Partial JoinState delete request - removing FullMember (o15.0.1.14) 386 - Full Delete of a group (o15.0.1.14) 387 - Verify Delete by trying to Join deleted group (o15.0.1.14) 388 - BAD Delete of IPoIB membership (no prev join) (o15.0.1.15) 389 390* GUIDInfo Record: 391 - All GUIDInfoRecords in subnet are obtained 392 393* MultiPathRecord: 394 - Perform some compliant and noncompliant MultiPathRecord requests 395 - Validation is via status in responses and IB analyzer 396 397* PKeyTableRecord: 398 - Perform some compliant and noncompliant PKeyTableRecord queries 399 - Validation is via status in responses and IB analyzer 400 401* LinearForwardingTableRecord: 402 - Perform some compliant and noncompliant LinearForwardingTableRecord queries 403 - Validation is via status in responses and IB analyzer 404 405* Event Forwarding: Register for trap forwarding using reports 406 - Send a trap and wait for report 407 - Unregister non-existing 408 409* Trap 64/65 Flow: Register to Trap 64-65, create traps (by 410 disconnecting/connecting ports) and wait for report, then unregister. 411 412* Stress Test: send PortInfoRecord queries, both single and RMPP and 413 check for the rate of responses as well as their validity. 414 415 4165.2 IB Management Simulator OpenSM Test Flows: 417 418The simulator provides ability to simulate the SM handling of virtual 419topologies that are not limited to actual lab equipment availability. 420OpenSM was simulated to bring up clusters of up to 10,000 nodes. Daily 421regressions use smaller (16 and 128 nodes clusters). 422 423The following test flows are run on the IB management simulator: 424 425* Stability: 426 Up to 12 links from the fabric are randomly selected to drop packets 427 at drop rates up to 90%. The SM is required to succeed in bringing the 428 fabric up. The resulting routing is verified to be correct as well. 429 430* LID Manager: 431 Using LMC = 2 the fabric is initialized with LIDs. Faults such as 432 zero LID, Duplicated LID, non-aligned (to LMC) LIDs are 433 randomly assigned to various nodes and other errors are randomly 434 output to the guid2lid cache file. The SM sweep is run 5 times and 435 after each iteration a complete verification is made to ensure that all 436 LIDs that could possibly be maintained are kept, as well as that all nodes 437 were assigned a legal LID range. 438 439* Multicast Routing: 440 Nodes randomly join the 0xc000 group and eventually the 441 resulting routing is verified for completeness and adherence to 442 Up/Down routing rules. 443 444* osmtest: 445 The complete osmtest flow as described in the previous table is run on 446 the simulated fabrics. 447 448* Stress Test: 449 This flow merges fabric, LID and stability issues with continuous 450 PathRecord, ServiceRecord and Multicast Join/Leave activity to 451 stress the SM/SA during continuous sweeps. InformInfo Set/Delete/Get 452 were added to the test such both existing and non existing nodes 453 perform them in random order. 454 4555.3 OpenSM Regression 456 457Using a back-to-back or single switch connection, the following set of 458tests is run nightly on the stacks described in table 2. The included 459tests are: 460 461* Stress Testing: Flood the SA with queries from multiple channel 462 adapters to check the robustness of the entire stack up to the SA. 463 464* Dynamic Changes: Dynamic Topology changes, through randomly 465 dropping SMP packets, used to test OpenSM adaptation to an unstable 466 network & verify DB correctness. 467 468* Trap Injection: This flow injects traps to the SM and verifies that it 469 handles them gracefully. 470 471* SA Query Test: This test exhaustively checks the SA responses to all 472 possible single component mask. To do that the test examines the 473 entire set of records the SA can provide, classifies them by their 474 field values and then selects every field (using component mask and a 475 value) and verifies that the response matches the expected set of records. 476 A random selection using multiple component mask bits is also performed. 477 4785.4 Cluster testing: 479 480Cluster testing is usually run before a distribution release. It 481involves real hardware setups of 16 to 32 nodes (or more if a beta site 482is available). Each test is validated by running all-to-all ping through the IB 483interface. The test procedure includes: 484 485* Cluster bringup 486 487* Hand-off between 2 or 3 SM's while performing: 488 - Node reboots 489 - Switch power cycles (disconnecting the SM's) 490 491* Unresponsive port detection and recovery 492 493* osmtest from multiple nodes 494 495* Trap injection and recovery 496 497 4986 Qualification 499---------------- 500 501Table 2 - Qualified IB Stacks 502============================= 503 504Stack | Version 505-----------------------------------------|-------------------------- 506OFED | 1.2 507OFED | 1.1 508OFED | 1.0 509OpenIB Gen2 (IBG2 distribution) | 1.0 510OpenIB Gen1 (IBGD distribution) | 1.8.0 511VAPI (Mellanox InfiniBand HCA Driver) | 3.2 and later 512 513Table 3 - Qualified Devices and Corresponding Firmware 514====================================================== 515 516Mellanox 517Device | FW versions 518--------|----------------------------------------------------------- 519MT43132 | InfiniScale - fw-43132 5.2.0 (and later) 520MT47396 | InfiniScale III - fw-47396 0.5.0 (and later) 521MT23108 | InfiniHost - fw-23108 3.3.2 (and later) 522MT25204 | InfiniHost III Lx - fw-25204 1.0.1i (and later) 523MT25208 | InfiniHost III Ex (InfiniHost Mode) - fw-25208 4.6.2 (and later) 524MT25208 | InfiniHost III Ex (MemFree Mode) - fw-25218 5.0.1 (and later) 525 526QLogic/PathScale 527Device | Note 528--------|----------------------------------------------------------- 529iPath | QHT6040 (PathScale InfiniPath HT-460) 530iPath | QHT6140 (PathScale InfiniPath HT-465) 531iPath | QLE6140 (PathScale InfiniPath PE-880) 532 533Note: OpenSM does not run on an IBM Galaxy (eHCA) as it does not expose 534QP0 and QP1. However, it does support it as a device on the subnet. 535 536