Dell R640 server

This page contains information out Dell PowerEdge R640 could servers which we have deployed in our cluster.

Documentation and software

Dell Support provides R640 information:

Monitoring CPU and power

The turbostat command reports processor topology, frequency, idle power-state statistics, temperature and power on X86 processors. Examples of usage are:

turbostat --Summary --quiet
turbostat --show CoreTmp,PkgTmp,PkgWatt,Bzy_MHz

Dell OpenManage

Download the OpenManage software ISO image from the R640_downloads page in the Systems Management download category.

Download the Dell EMC OpenManage Deployment Toolkit (Linux) DTK ISO file and mount it on /mnt.

Dell EMC System Update (DSU)

Dell EMC System Update (DSU) is a script optimized update deployment tool for applying Dell Update Packages (DUP) to Dell EMC PowerEdge servers. See the DSU manuals.

The DSU may also be configured as a Yum repository, see the DSU page. The commands are:

curl -O https://linux.dell.com/repo/hardware/dsu/bootstrap.cgi
bash bootstrap.cgi

Alternatively, download the Systems-Management_Application_* file and execute it.

This will create the Yum repository file:

/etc/yum.repos.d/dell-system-update.repo

Install RPM packages including iDRAC tools:

yum install dell-system-update srvadmin-idracadm7

Using DSU to preview Dell upgrades:

/usr/sbin/dsu -n -p

To apply Dell upgrades:

/usr/sbin/dsu -u

Systems Management Managed Node Core and CLI

Install the package:

yum install srvadmin-omacore

Disk reports:

omreport storage vdisk                      # List of Virtual Disks in the System
omreport storage pdisk controller=1         # List of Physical Disks on Controller 1
omreport storage pdisk controller=1 vdisk=0 # List of Physical Disks belonging to Virtual Disk0

racadm command

Make a soft link for the racadm command:

ln -s /opt/dell/srvadmin/bin/idracadm7 /usr/local/bin/racadm

Read the Integrated Dell Remote Access Controller 9 RACADM CLI Guide.

There is a useful racadm cheat sheet.

Get Health LED status:

racadm getled

Make the LED blink:

racadm setled -l 1

Stop the LED from blinking:

racadm setled -l 0

Get system and version information:

Service Tag: racadm getsvctag
System info: racadm getsysinfo -s
Versions:    racadm getversion
BIOS:        racadm getversion -b
CPLD:        racadm getversion -c
iDRAC:       racadm getversion -f idrac

Get system logs:

SEL Event Log: racadm getsel
Lifecycle Log: racadm lclog view

Get hardware inventory information:

racadm hwinventory

Clone system configuration with racadm

The racadm command can be used to get and set the system configuration using:

--clone Gets the configuration .xml files without system-related details such as service tag. The .xml file received does not have any virtual disk creation option.

For example:

racadm get --clone -t xml -f config.xml

In the config.xml you may possibly want to delete the line setting the iDRAC password so that your current password is preserved:

<Attribute Name="Users.2#Password">Calvin#SCP#CloneReplace1</Attribute>

To use the config.xml on another server and reboot automatically by default:

racadm set -t xml -f config.xml

To postpone the reboot:

racadm set -t xml -f config.xml -b NoReboot

Add the --preview to just check the operation.

You can also reconfigure just a single setting component with the -c flag, for example:

racadm set -t xml -f config.xml -c NIC.Integrated.1-1-1 -b NoReboot

To configure the UEFI boot order:

racadm set bios.biosbootsettings.UefiBootSeq NIC.PxeDevice.1-1,Disk.SATAEmbedded.A-1

or configure this setting in the config.xml file:

<Attribute Name="UefiBootSeq">NIC.PxeDevice.1-1, Disk.SATAEmbedded.A-1</Attribute>

The server will need to be rebooted, see the racadm set -b NoReboot|Graceful|Forced options in:

racadm help set

The racadm set operation launches an iDRAC job which must complete before you reboot the server. See the job status by:

racadm jobqueue view -i JID_xxxxxx

Setting system parameters

Set the E-mail alerts destination:

racadm set iDRAC.EmailAlert.Address.1 <some-email-address>

View the BIOS boot mode:

racadm get BIOS.BiosBootSettings

To set the boot mode to UEFI at the next reboot:

racadm set BIOS.BiosBootSettings.BootMode Uefi
racadm jobqueue create BIOS.Setup.1-1

Note: It seems that additional UEFI parameters also need to be set (TBD):

UefiBootSeq NIC.PxeDevice.1-1,Disk.SATAEmbedded.A-1
HddPlaceholder Enabled

To enable IPMI over LAN:

racadm set iDRAC.IPMILan.Enable 1

The server needs to be rebooted in order for the new setting to take effect.

Get a list of settings:

racadm get BIOS

To read some current values:

racadm get iDRAC.IPMILan
racadm get BIOS.ProcSettings
racadm get BIOS.SysProfileSettings
racadm get BIOS.SysProfileSettings.WorkloadProfile

See the manual Configuring IPMI over LAN using RACADM.

To enable WakeOnLan first check the installed NICs (network adapters), for example:

racadm get NIC.NICConfig
NIC.NICConfig.1 [Key=NIC.Embedded.1-1-1#NICConfig]
NIC.NICConfig.2 [Key=NIC.Embedded.2-1-1#NICConfig]

View the NIC settings:

racadm get NIC.NICConfig.1

Set the WakeOnLan:

racadm set NIC.NICConfig.1.WakeOnLan Enabled

Then you must create a job for this NIC:

racadm jobqueue create NIC.Embedded.1-1-1

A new setting will only take effect after a system reboot.

PERC H330 RAID controller

The R640 comes with a PERC H330 RAID controller.

By default the installed disks are unallocated, and you have to configure their usage.

Press F2 during start-up to enter the setup menus. Go to the Device Settings menu.

Configure the H330 via the menu item Device Settings and select the RAID controller item:

  • In the RAID controller Main Menu select the Configuration Management item.

  • Change the disk setup into Convert to Non-RAID.

  • In the Controller Management menu item Select Boot Device define the non-RAID disk as the boot device.

Press Finish to save all settings.

raidcfg tool

The OpenManage tool raidcfg can be installed from the above mentioned Dell EMC OpenManage Deployment Toolkit (Linux) folder /mnt/RPMs/rhel7/x86_64/:

yum install raidcfg*rpm

See raidcfg quick reference.

To list installed RAID controllers:

/opt/dell/toolkit/bin/raidcfg controller

perccli tool

The perccli tool for Linux is downloaded from the PowerEdge server’s SAS RAID downloads

Install the RPM (the version may differ):

tar xzf perccli_linux_NF8G9_A07_7.529.00.tar.gz
cd perccli_7.5-007.0529_linux/
yum install perccli-007.0529.0000.0000-1.noarch.rpm
ln -s /opt/MegaRAID/perccli/perccli64 /usr/local/bin/perccli

See the Reference Guide at https://topics-cdn.dell.com/pdf/dell-sas-hba-12gbps_reference-guide_en-us.pdf

Example command:

perccli show

Disk status

This command shows all disks for controller 1:

perccli /c1/eall/sall show

This command shows the RAID rebuild status for controller 1:

perccli /c1/eall/sall show rebuild

Booting and BIOS configuration

Press F2 during start-up to enter the BIOS and firmware setup menus. Go to the BIOS Settings menu.

Minimal configuration of a new server or motherboard

At our site the following minimal settings are required for a new server or a new motherboard. Remaining settings will be configured by racadm.

The Dell iDRAC9 (BMC) setup is accessed via the System Setup menu item iDRAC Settings:

  • In the System Summary page read the NIC iDRAC MAC Address from this page for configuring the DHCP server.

  • In the Network page set the Enable IPMI over LAN to Enabled.

Go to the System Setup menu item Device Settings and select the Integrated NIC items:

  • In the NIC Main Configuration Page select NIC Configuration. We use NIC port 3 (1 Gbit) as the system’s NIC.

  • Read the NIC Ethernet MAC Address from this page for configuring the DHCP server.

  • Select the Legacy Boot Protocol item PXE.

Boot Sequence menu:

  • Click the Boot Sequence item to move PXE boot up above the hard disk boot.

Boot settings menu

  • Boot Mode = BIOS.

  • In the Boot Sequence menu:

    • Click the Boot Sequence item to move PXE boot up above the hard disk boot (if desired).

    • Verify that the correct devices are selected in Boot Option Enable/Disable.

UEFI boot settings

If UEFI boot mode is selected, the following must be enabled before installing the OS for the first time:

  • In the Boot Setting menu:

    • Hard-disk Drive Placeholder = Enabled

Memory settings menu

  • Memory Operating Mode = Optimizer Mode.

  • Node interleaving = Disabled.

  • Opportunistic Self-Refresh = Disabled.

  • ADDDC setting = Disabled.

Adaptive Double DRAM Device Correction (ADDDC) that is available when a system is configured with memory that has x4 DRAM organization (32GB, 64GB DIMMs). ADDDC is not available when a system has x8 based DIMMs (8GB, 16GB) and is immaterial in those configurations. For HPC workloads, it is recommended that ADDDC be set to disabled when available as a tunable option. See BIOS characterization for HPC with Intel Cascade Lake processors.

Processor settings menu

  • Disable Hyperthreading by Logical Processor = Disabled.

  • Virtualization Technology = Disabled.

  • Dell Controlled Turbo = Disabled.

  • Sub NUMA Cluster = Enabled.

The Sub NUMA Cluster (SNC, replaces the older Cluster-on-Die (COD) implementation) has been shown to improve performance, see BIOS characterization for HPC with Intel Cascade Lake processors. This will cause each processor socket to have two NUMA domains for the two memory controllers, so a dual-socket server will have 4 NUMA domains.

Display the NUMA domains by:

$  numactl --hardware
available: 4 nodes (0-3)
...

System Profile Settings menu

  • System Profile = Performance.

System Thermal settings

System Thermal Profile settings can be changed based on the need to maximize performance or power efficiency. This can make CPU thermal throttling less likely.

Read the document Custom Cooling Fan Options for Dell EMC PowerEdge Servers.

In the BIOS setup screen, select iDRAC->Thermal and configure Thermal profile = Maximum performance.

Read the current settings:

racadm get System.ThermalSettings

For HPC applications set the fans to high performance:

racadm set System.ThermalSettings.ThermalProfile "Maximum Performance"
racadm set System.ThermalSettings.MinimumFanSpeed 25

A MinimumFanSpeed value of 255 indicates the Default setting. Values between 21 (the default) and 100 may be used, but high values consume lots of power and generate noise. For HPC systems a MinimumFanSpeed of 40 to 50 may perhaps be useful.

System Security menu

  • AC Power Recovery = Last state.

Miscellaneous Settings menu

  • Keyboard NumLock = Off.

NVDIMM Optane persistent memory setup

Note that Intel has discontinued NVDIMM Optane persistent memory with recent processor generations as described in the Optane_EOL page. Documentation of NVDIMM Optane persistent memory:

Configuration of persistent memory in Dell PowerEdge servers is described in the manual Dell EMC PMem 200 Series User’s Guide in the server documentation:

  • To configure NVDIMM 3D_XPoint known as Intel Optane persistent memory DIMM modules go to the System BIOS Settings boot menus. Select the menu:

    Memory Settings > Persistent Memory > Intel Persistent Memory > Persistent Memory DIMM Configuration
    
  • Memory mode configuration for persistent memory:

    • To create an NVDIMM goal in BIOS, go to the sub-menu Create Goal Config.

    • The BIOS options determine how the goal is created and the PMems are configured:

      Operation Target: Platform - Applies the goal to all the DIMMs in the system (recommended)
      Persistent [%]: 100 - Creates a goal of 100% Persistent memory across the selected PMems
      

Configure persistent memory namespaces

Install this package:

dnf install ndctl

and list all physical devices:

ndctl list -DHi

The configuration of namespaces will decide how much memory capacity to expose to the OS. Create a namespace on each of the persistent memory modules:

ndctl create-namespace

See the manual for ndctl-create-namespace. List namespaces:

ndctl list -N

To correlate a namespace to a PMem device, use the lsblk command.

Managing NVDIMMs with ipmctl

The ipmctl is a utility for configuring and managing Intel® Optane™ Persistent Memory modules (PMem). On EL8 systems install this package from EPEL:

dnf install ipmctl

Read the ipmctl manual page. For example, display the NVDIMM in the system:

$ ipmctl show -dimm
 DimmID | Capacity    | LockState        | HealthState | FWVersion
======================================================================
 0x0001 | 126.742 GiB | Disabled, Frozen | Healthy     | 02.02.00.1553
 0x1001 | 126.742 GiB | Disabled, Frozen | Healthy     | 02.02.00.1553

Other useful commands:

$ ipmctl help
$ ipmctl show -topology -socket

PXE boot setup

Go to the System Setup menu item Device Settings and select the Integrated NIC items:

  • In the NIC Main Configuration Page select NIC Configuration. We use NIC port 3 (1 Gbit) as the system’s NIC.

  • Read the NIC Ethernet MAC Address from this page for configuring the DHCP server.

  • Select the Legacy Boot Protocol item PXE.

  • Set Wake On LAN to Enabled.

  • Set the Boot Retry Count = 3 if desired.

  • Disable PXE boot for all unused NICs (port 1).

Press Finish to save all settings.

It is possible to request a one-time PXE boot from the BMC using this IPMItool raw command:

ipmitool -I lanplus -H <BMC-address> -U <username> -P <password> raw 0x00 0x08 0x05 0xa0 0x04 0x00 0x00 0x00

The FreeIPMI command ipmi-raw may also be used.

iDRAC (BMC) setup

The Dell iDRAC9 (BMC) setup is accessed via the System Setup menu item iDRAC Settings:

  • In the System Summary page read the NIC iDRAC MAC Address from this page for configuring the DHCP server.

  • In the Network page set the Enable IPMI over LAN to Enabled.

  • In the User Configuration page set the User 2 (root) Administrator user name and change the password. The Dell iDRAC default password for root is calvin and you will be asked to change this at the first login.

    IMPORTANT: The iDRAC9 keyboard layout is US English! Do not use characters that differ from the US layout!

  • Optional: In the Thermal page set Thermal: Maximum Performance.

Press Finish to save all settings.

SSH login to iDRAC

CLI login to the iDRAC uses SSH as the root user.

If you wish, you may add your management server’s SSH public key to the iDRAC root user account:

racadm sshpkauth -i 2 -k 1 -t "CONTENTS OF SSH PUBLIC KEY"

For further SSH key options:

racadm help sshpkauth

iDRAC IP and DNS information

Read the IP v4/v6 information:

racadm get iDRAC.IPv4
racadm get iDRAC.IPv6

If DHCP is enabled on iDRAC and you want to use the DNS server IP provided by the DHCP server:

racadm set iDRAC.IPv4.DNSFromDHCP 1
racadm set iDRAC.NIC.DNSDomainFromDHCP 1
racadm set iDRAC.NIC.DNSDomainNameFromDHCP 1

The iDRAC DNS Name cannot be obtained from DHCP! Therefore you must always set the DNS name manually:

racadm set iDRAC.NIC.DNSRacName <iDRACNAME>

Manual DNS settings:

  • Set iDRAC domain name:

    racadm set iDRAC.NIC.DNSDomainName <DOMAIN.NAME>
    
  • Set iDRAC DNS Server:

    racadm config -g cfgLanNetworking -o cfgDNSServer1 x.x.x.x
    racadm config -g cfgLanNetworking -o cfgDNSServer2 y.y.y.y
    
  • Set the server’s DNS hostname by:

    racadm  set System.ServerOS.HostName <Server-DNS-name>
    

iDRAC web-server security Host Header enforcement

Starting with iDRAC firmware 5.10, by default, iDRAC9 will check the HTTP / HTTPS Host Header and compare to the DNSRacName and DNSDomainName iDRAC parameters. When the values do not match, the iDRAC will refuse the HTTP / HTTPS connection. This is a security issue recorded in CVE-2021-21510 with the description:

Dell iDRAC8 versions prior to 2.75.100.75 contain a host header injection vulnerability. A remote unauthenticated attacker may potentially exploit this vulnerability by injecting arbitrary ‘Host’ header values to poison a web-cache or trigger redirections

This means that you cannot use the iDRAC’s DNS name to access its web-server! However, you can still connect to the IP-address in stead of the DNS name.

Please read the Dell Knowledge Base article 000193619 HTTP/HTTPS FQDN Connection Failures On iDRAC9 firmware version 5.10.00.00.

In iDRAC9 5.10.00.00, this Host Header enforcement can be disabled with the following RACADM command:

racadm set idrac.webserver.HostHeaderCheck 0

The iDRAC must be rebooted in order to activate the new settings, for example, from the Linux CLI:

ipmitool bmc reset cold

The HostHeaderCheck variable does not exist in firmware 5.00 and earlier!

See the web-server settings with:

racadm get idrac.webserver

View Lifecycle errors

The Lifecycle log can be read by:

racadm lclog view

To select specific events, see help details using:

racadm help lclog view

For example, select events of type Warning since a specific timestamp and show the last 5 events:

racadm lclog view -r "2021-09-01 00:00:00" -s Warning -n 5

View system sensors and power status

Display system sensors including power, temperature and health:

racadm getsensorinfo

View inlet temperature

View the server’s Inlet temperature history:

racadm inlettemphistory get

SMTP alerts from iDRAC

First you must configure the DNS name of the iDRAC, see https://www.dell.com/support/article/us/en/04/sln309388/dell-idrac-how-to-configure-the-email-notifications-for-system-alerts-on-idrac-7-8-and-9?lang=en

In the iDRAC web GUI go to iDRAC Settings->Connectivity->Common Settings and configure the DNS domain name and hostname.

Then configure alerts in Configuration->System Settings->Alert Configuration->Alerts. Then go to the SMTP (Email) Configuration sub-menu and set up SMTP alerts.

TSR reports from iDRAC

TSR system reports for Dell Support cases are normally generated using the iDRAC web interface.

It is also possible to generate TSR reports using the racadm techsupreport subcommand:

racadm techsupreport collect

Check the progress of the report generation with:

racadm jobqueue view

After some minutes export the completed TSR report to a local ZIP file:

racadm techsupreport export -f <filename>.zip

iDRAC server power management

The server power can be managed from the iDRAC web interface under the Dashbord pull-down menu Graceful shutdown.

The iDRAC9 CLI can also be used to manage server power. Use SSH to login to the CLI, and the Help menu states this:

/admin1-> racadm help serveraction
serveraction -- perform system power management operations
Usage:
racadm serveraction <action>
<action>:  server power management operation to perform.  Must be one of:
           graceshutdown   : perform a graceful shutdown of server
           powerdown       : power server off
           powerup         : power server on
           powercycle      : perform server power cycle
           hardreset       : force hard server power reset
           powerstatus     : display current power status of server
           nmi             : Genarate Non-Masking Interrupt to halt system operation

To hard power cycle the server:

racadm serveraction hardreset

LCD front panel display

In the web interface, go to Configurations > System Settings > Hardware Settings > Front Panel configuration.

In the CLI:

racadm help System.LCD.Configuration

For example, set Front LCD to the OS hostname:

racadm set System.LCD.Configuration 16

iDRAC or LifeCycle Controller errors

If the iDRAC controller seems frozen, or if the LifeCycle Controller (LCC) has errors, one should try to perform a deep power drain.

We have seen the R640 LCC going into a Recovery Mode preventing the setting of BIOS parameters using racadm, and an error message on the console:

Couldn't locate device handle for MAS001.. System rebooting

This error was resolved by a deep power drain of the server.

Deep power drain procedure

  • Pull both power cables from the server

  • Hold down the power button for 30 seconds

  • Plug the power cables back in

  • Wait for 30-60 seconds before powering the server on. This will drain the residing power from the capacitors and waiting 30-60 seconds before powering on will allow the iDRAC to complete post.

  • Connect via the idrac and follow the boot process via the virtual or physical console.

iDRAC Easy Restore

See the iDRAC9 User’s Guide:

After you replace the motherboard on your server, Easy Restore allows you to automatically restore the following data:

  • System Service Tag

  • Asset Tag

  • Licenses data

  • UEFI Diagnostics application

  • System configuration settings—BIOS, iDRAC, and NIC

Easy Restore uses the Easy Restore flash memory to back up the data. When you replace the motherboard and power on the system, the BIOS queries the iDRAC and prompts you to restore the backed-up data. The first BIOS screen prompts you to restore the Service Tag, licenses, and UEFI diagnostic application. The second BIOS screen prompts you to restore system configuration settings. If you choose not to restore data on the first BIOS screen and if you do not set the Service Tag by another method, the first BIOS screen is displayed again. The second BIOS screen is displayed only once.

Resetting the iDRAC

The Integrated Dell Remote Access Controller (iDRAC) is responsible for system profile settings and out-of-band management. Sometimes, iDRAC may become unresponsive due to various reasons. Symptoms of unresponsive iDRAC include the following:

  • Racadm command returns “ERROR: Unable to perform requested operation”

  • No ssh/telnet access to the iDRAC (the attempted connection times out)

  • No iDRAC browser access

  • Pinging the iDRAC IP Address fails

The iDRAC can be reset using the System Identification button: