Mail failure due to Cloudflare DNS

On the morning of Friday 20th November there were no emails from my FreeBSD server, opal. These are generated daily by the periodic system under FreebSD, one email containing a system overview and one a security report for the previous 24 hours.

opal was up, so what was the cause of the non-appearance of the emails?

The clue was in /var/log/maillog:

Nov 20 03:17:42 opal sm-msp-queue[83606]: 0AK34MKC083509: to=root, ctladdr=root\
 (0/0), delay=00:13:20, xdelay=00:00:10, mailer=relay, pri=120837, relay=[127.0\
.0.1] [127.0.0.1], dsn=4.0.0, reply=451 4.4.3 Temporary lookup failure of 127.0\
.0.1 at bl.spamcop.net, stat=Deferred: 451 4.4.3 Temporary lookup failure of 12\
7.0.0.1 at bl.spamcop.net
Nov 20 03:17:42 opal sm-msp-queue[83606]: 0AK34dM3083585: to=root, ctladdr=root\
 (0/0), delay=00:13:03, xdelay=00:00:00, mailer=relay, pri=123842, relay=[127.0\
.0.1], dsn=4.0.0, reply=451 4.4.3 Temporary lookup failure of 127.0.0.1 at bl.s\
pamcop.net, stat=Deferred: 451 4.4.3 Temporary lookup failure of 127.0.0.1 at b\
l.spamcop.net

I run my own DNS, forwarding queries to the Cloudflare DNS servers (1.1.1.1 and 1.0.0.1). Hmm, what did nslookup tell me?

  > bl.spamcop.net
  Server:         127.0.0.1
  Address:        127.0.0.1#53

  Non-authoritative answer:
  Name:   bl.spamcop.net
  Address: 184.94.240.110
** server can't find bl.spamcop.net: SERVFAIL

Bizarre that it returns the address, and then returns a SERVFAIL. I tried with Google DNS:

  > server 8.8.8.8
  Default server: 8.8.8.8
  Address: 8.8.8.8#53
  > bl.spamcop.net
  Server:         8.8.8.8
  Address:        8.8.8.8#53

  Non-authoritative answer:
  Name:   bl.spamcop.net
  Address: 184.94.240.110

This failure was affecting both incoming and outgoing mail:

  # mailq -Ac
  /var/spool/clientmqueue (4 requests)
  -----Q-ID----- --Size-- -----Q-Time----- ------------Sender/Recipient-----------
  0AK7HWmR084402     2501 Fri Nov 20 07:17 MAILER-DAEMON
                   (Deferred: 451 4.4.3 Temporary lookup failure of 127.0.0.1 at)
                       root
  0AK7HWmS084402     5506 Fri Nov 20 07:17 MAILER-DAEMON
                   (Deferred: 451 4.4.3 Temporary lookup failure of 127.0.0.1 at)
                       root
  0AK34MKC083509      787 Fri Nov 20 03:04 root
                   (Deferred: 451 4.4.3 Temporary lookup failure of 127.0.0.1 at)
                       root
  0AK34dM3083585     3801 Fri Nov 20 03:04 root
                   (Deferred: 451 4.4.3 Temporary lookup failure of 127.0.0.1 at)
                       root
		Total requests: 4

I switched to OpenDNS servers as the forward target in the local DNS and mails started flowing again.

Stumbling over another mail error

While reading the sendmail log, I found an unrelated issue with dovecot:

Nov 20 10:14:37 opal dovecot[32592]: imap-login: Disconnected (no auth attempts in 0 secs): user=<>, rip=192.168.0.253, lip=192.168.0.4, TLS handshaking: SSL_accept() failed: error:14094416:SSL routines:ssl3_read_bytes:sslv3 alert certificate unknown: SSL alert number 46, session=<+iujHIe0YJzAqAD9>
Nov 20 10:14:37 opal dovecot[32592]: imap-login: Disconnected (no auth attempts in 0 secs): user=<>, rip=192.168.0.253, lip=192.168.0.4, TLS handshaking: SSL_accept() failed: error:14094416:SSL routines:ssl3_read_bytes:sslv3 alert certificate unknown: SSL alert number 46, session=<TASlHIe0YpzAqAD9>

These errors are caused by GMail access from an Android phone. I remember a similar issue from a few months ago, which was cured by deleting the email account and re-creating it. So that's I did this time, except that the setup process told me the outgoing mail server did not offer STARTTLS. That's not right.

However, GMail was right...

  [mark@opal:~]$ telnet localhost 25
  Trying 127.0.0.1...
  Connected to localhost.
  Escape character is '^]'.
  220 opal.hydrus.org.uk ESMTP Sendmail 8.16.1/8.16.1; Fri, 20 Nov 2020 10:41:41 GMT
  EHLO localhost
  250-opal.hydrus.org.uk Hello localhost [127.0.0.1], pleased to meet you
  250-ENHANCEDSTATUSCODES
  250-PIPELINING
  250-8BITMIME
  250-SIZE
  250-DSN
  250-ETRN
  250-AUTH DIGEST-MD5 CRAM-MD5
  250-DELIVERBY
  250 HELP
  QUIT
  221 2.0.0 opal.hydrus.org.uk closing connection
  Connection closed by foreign host.

No STARTTLS capability was being shown. I restarted the sendmail service many times by now, so that I actually managed to spot the error:

  Nov 20 12:04:20 opal sm-mta[88853]: STARTTLS=server, error: SSL_CTX_check_private_key failed (PATHNAME ELIDED): 0

Sendmail was showing STARTTLS on crimson, the backup server, so I compared configurations. I found a key difference in the sendmail configuration file (<machine_name>.mc):

crimson

  define(`confSERVER_CERT', `CERT_DIR/cert.pem')dnl

opal

  define(`confSERVER_CERT', `CERT_DIR/chain.pem')dnl

Yet another shoot footing incident. I must have changed this sometime in the past for reasons I can no longer remember and hadn't noticed the effect. Shows how often I send mail out from a remote client using hydrus.org.uk.

Now, the fullchain.pem file is used to supply the SERVER_CERT and all is good.

At present, Cloudflare is resolving bl.spamcop.net without error, but there does seem to be a small delay after the address is returned in nslookup.