Reply to: Neural Network Back-Propagation Revisited with Ordinary Differential Equations

This article is worth reading: Neural Network Back-Propagation Revisited with Ordinary Differential Equations

I replied:

Thank you very much for this very informative article providing many links, the Python code, and the results.

According the mentioned paper from Owens + Filkin the speedup expected by using a stiff ODE solver should be two to 1,000. So your results demonstrate that the number of iterations clearly is way less than for gradient descent and all its variants. Unfortunately, the run times reported are way slower than for gradient descent. This was not expected.

The following factors could play a role in this:

  1. All the methods search for a local minimum (gradient should be zero, Hessian is not checked or known). They do not necessarily find a global minimum. So when these different methods are run, they each probably iterate towards different local minima. So each methods likely has done something different.
  2. I wonder why you have used the zvode solver from scipy.integrate. I would recommend vode, or even better lsoda. Your chosen tolerances are quite high (atol=1e-8, rtol=1e-6), at least for the beginning. These high tolerances may force smaller step-sizes than actually required. In particular, as the start-values are random, there seems to be no compelling reason to use these strict tolerances right from the start. Also it is known, that strict tolerances might lead the ODE code to use higher order BDF which in turn are not stable enough for highly stiff ODEs. Only BDF up to order 2 are A-stable. So atol=1e-4, rtol=1e-4 might show different behaviour.
  3. Although it is expected that the resulting ODE is stiff, it might be the case that in your particular setting the system was only very mildly stiff. Your charts give some indications of that, at least in the beginning. This can be checked by simply re-running with a non-stiff code, e.g., like dopri5. Again with lower tolerances.
  4. As can be seen in your chart, above learning accuracy 80% the ODE solver takes a huge number of steps. It would be interesting to know whether at this stage there are many Jacobian evaluations for the ODE solver, or whether there are many rejected steps due to Newton convergence failures within the ODE code.
  5. scipy.integrate apparently does not allow for automatic differentiation. Therefore the ODE solver must resort to numerical differencing for evaluating the Jacobian, which is very slow. So using something like Zygote might improve here.

I always wondered why the findings from Owens+Filkin where not widely adopted. Your paper provides an answer, although a negative one. Taking into
account the above points, I still have hope that stiff ODE solvers have great potential for machine learning with regard to performance. You already mentioned that ODE solvers provide the benefit that hyperparameters no longer need to be estimated.

Calling C from Julia

Two ways to compute the error function or Bessel function in Julia.

1. Calling C. On UNIX libm provides erf() and j0(). So calling them goes like this:


In this case one can omit the reference to Watch out for the funny looking (Float64,).

2. Using Julia. SpecialFunctions.jl provides erf() and besselj0.

import Pkg
import SpecialFunctions

Splitting and anti-merging vCard files

Sometimes vCard files need to be split into smaller files, or the file needs to be protected against merging in another application.

1. Splitting. Below Perl script splits the input file into as many files as required. Output files are named adr1.vcf, adr2.vcf, etc. You can pass a command line argument “-n” to specify the number of card records per file. Splitting a vCard file is provided in palmadrsplit on GitHub:

use Getopt::Std;

my %opts;
my ($i,$k,$n) = (1,0,950);
$n = ( defined($opts{'n'}) ? $opts{'n'} : 950 );

open(F,">adr.$i.vcf") || die("Cannot open adr.$i.vcf for writing");
while (<>) {
        if (/BEGIN:VCARD/) {
                if (++$k % $n == 0) {   # next address record
                        close(F) || die("Cannot close adr.$i.vcf");
                        ++$i;   # next file number
                        open(F,">adr.$i.vcf") || die("Cannot open adr.$i.vcf for writing");
        print F $_;
close(F) || die("Cannot close adr.$i.vcf");

This is required for Google Contacts, as Google does not allow to import more than 1,000 records per day, see Quotas for Google Services.

2. Anti-Merge. Inhibiting annoying merging is given in file palmantimerge on GitHub. Overall logic is as follows: Read entire vCard file and each card, delimited by BEGIN:VCARD and END:VCARD, is put on a hashmap. Each hashmap entry is a list of vCards. Hash key is the N: entry, i.e., the concatentation of lastname and firstname. Once everything is hashed, then walk through hash. Those hash entries, where the list contains just one entry, can be output as is. Where the list contains more than one entry, then these entries would otherwise be merged, and then the N: part is modified by using the ORG: field.

use strict;
my @singleCard = ();    # all info between BEGIN:VCARD and END:VCARD
my ($name) = "";        # N: part, i.e., lastname semicolon firstname
my ($clashes,$line,$org) = (0,"","");
my %allCards = {};      # each entry is list of single cards belonging to same first and lastname, so hash of array of array

while (<>) {
        if (/BEGIN:VCARD/) {
                ($name,@singleCard) = ("", ());
                push @singleCard, $_;
        } elsif (/END:VCARD/) {
                push @singleCard, $_;
                push @{ $allCards{$name} }, [ @singleCard ];
        } else {
                push @singleCard, $_;
                $name = $_ if (/^N:/);

for $name (keys %allCards) {
        $clashes = $#{$allCards{$name}};
        for my $sglCrd (@{$allCards{$name}}) {
                if ($clashes == 0) {
                        for $line (@{$sglCrd}) { print $line; }
                } else {
                        $org = "";
                        for $line (@{$sglCrd}) {
                                $org = $1 if ($line =~ /^ORG:([ \-\+\w]+)/);
                        for $line (@{$sglCrd}) {
                                $line =~ s/;/ \/${org}\/;/ if ($line =~ /^N:/);
                                print $line;

Every lastname is appended with “/organization/” if the combination of firstname and lastname is not unique. For example, two records with Peter Miller in ABC-Corp and XYZ-Corp, will be written as N:Miller /ABC-Corp/;Peter and N:Miller /XYZ-Corp/;Peter.

This way Simple Mobile Tools Contacts will not merge records together which it shouldn’t. Issue #446 for this is on GitHub.

Automated Rebooting of Auerswald Communication System

The wired telephones in my house are connected to a telephone-system from Auerswald. This PBX handles VoIP and ISDN. My children make fun of me that I still use landlines, they just use cell phones.

Unfortunately since a couple of months the system no longer is fully reliable and needs constant reboots, for unknown reasons. I deleted the entire call history, in the hope that this reduced storage would alleviate the problem, but this did not help. So I had to automate the reboots. The script below mimics the login-screen and the reboot-screen, as shown below. To figure out the details of the login screen I used the network analyzer of Firefox to see which URL and which commands are sent to the web-server.

Script is:

# Login
curl -s -o AW_Login.html -d LOGIN_NOW=true -d jssupport=true -d timeout=200 -d LOGIN_NAME=admin -d LOGIN_PASS=SecretPasswd -d Anmelden=Anmelden -c AW_cookieJar http://tk/login

sleep 2

# Ask for reboot
curl -s -o AW_Reboot.html -b AW_cookieJar -d reboottimevalue=0 -d rebootpbx=Neustart 'http://tk/updownload?timereboot_PBX=1&reboottime=0'

The telephone-system has DNS name tk. This name is arbitrary.

This script is then run via cron.

Final remark: Unfortunately, even after daily rebooting, the telephone system still is misbehaving on a random basis.