An I/O controller for virtual pinball machines: accelerometer nudge sensing, analog plunger input, button input encoding, LedWiz compatible output controls, and more.

Dependencies:   mbed FastIO FastPWM USBDevice

Fork of Pinscape_Controller by Mike R

/media/uploads/mjr/pinscape_no_background_small_L7Miwr6.jpg

This is Version 2 of the Pinscape Controller, an I/O controller for virtual pinball machines. (You can find the old version 1 software here.) Pinscape is software for the KL25Z that turns the board into a full-featured I/O controller for virtual pinball, with support for accelerometer-based nudging, a mechanical plunger, button inputs, and feedback device control.

In case you haven't heard of the idea before, a "virtual pinball machine" is basically a video pinball simulator that's built into a real pinball machine body. A TV monitor goes in place of the pinball playfield, and a second TV goes in the backbox to show the backglass artwork. Some cabs also include a third monitor to simulate the DMD (Dot Matrix Display) used for scoring on 1990s machines, or even an original plasma DMD. A computer (usually a Windows PC) is hidden inside the cabinet, running pinball emulation software that displays a life-sized playfield on the main TV. The cabinet has all of the usual buttons, too, so it not only looks like the real thing, but plays like it too. That's a picture of my own machine to the right. On the outside, it's built exactly like a real arcade pinball machine, with the same overall dimensions and all of the standard pinball cabinet trim hardware.

It's possible to buy a pre-built virtual pinball machine, but it also makes a great DIY project. If you have some basic wood-working skills and know your way around PCs, you can build one from scratch. The computer part is just an ordinary Windows PC, and all of the pinball emulation can be built out of free, open-source software. In that spirit, the Pinscape Controller is an open-source software/hardware project that offers a no-compromises, all-in-one control center for all of the unique input/output needs of a virtual pinball cabinet. If you've been thinking about building one of these, but you're not sure how to connect a plunger, flipper buttons, lights, nudge sensor, and whatever else you can think of, this project might be just what you're looking for.

You can find much more information about DIY Pin Cab building in general in the Virtual Cabinet Forum on vpforums.org. Also visit my Pinscape Resources page for more about this project and other virtual pinball projects I'm working on.

Downloads

  • Pinscape Release Builds: This page has download links for all of the Pinscape software. To get started, install and run the Pinscape Config Tool on your Windows computer. It will lead you through the steps for installing the Pinscape firmware on the KL25Z.
  • Config Tool Source Code. The complete C# source code for the config tool. You don't need this to run the tool, but it's available if you want to customize anything or see how it works inside.

Documentation

The new Version 2 Build Guide is now complete! This new version aims to be a complete guide to building a virtual pinball machine, including not only the Pinscape elements but all of the basics, from sourcing parts to building all of the hardware.

You can also refer to the original Hardware Build Guide (PDF), but that's out of date now, since it refers to the old version 1 software, which was rather different (especially when it comes to configuration).

System Requirements

The new Config Tool requires a fairly up-to-date Microsoft .NET installation. If you use Windows Update to keep your system current, you should be fine. A modern version of Internet Explorer (IE) is required, even if you don't use it as your main browser, because the Config Tool uses some system components that Microsoft packages into the IE install set. I test with IE11, so that's known to work. IE8 doesn't work. IE9 and 10 are unknown at this point.

The Windows requirements are only for the config tool. The firmware doesn't care about anything on the Windows side, so if you can make do without the config tool, you can use almost any Windows setup.

Main Features

Plunger: The Pinscape Controller started out as a "mechanical plunger" controller: a device for attaching a real pinball plunger to the video game software so that you could launch the ball the natural way. This is still, of course, a central feature of the project. The software supports several types of sensors: a high-resolution optical sensor (which works by essentially taking pictures of the plunger as it moves); a slide potentiometer (which determines the position via the changing electrical resistance in the pot); a quadrature sensor (which counts bars printed on a special guide rail that it moves along); and an IR distance sensor (which determines the position by sending pulses of light at the plunger and measuring the round-trip travel time). The Build Guide explains how to set up each type of sensor.

Nudging: The KL25Z (the little microcontroller that the software runs on) has a built-in accelerometer. The Pinscape software uses it to sense when you nudge the cabinet, and feeds the acceleration data to the pinball software on the PC. This turns physical nudges into virtual English on the ball. The accelerometer is quite sensitive and accurate, so we can measure the difference between little bumps and hard shoves, and everything in between. The result is natural and immersive.

Buttons: You can wire real pinball buttons to the KL25Z, and the software will translate the buttons into PC input. You have the option to map each button to a keyboard key or joystick button. You can wire up your flipper buttons, Magna Save buttons, Start button, coin slots, operator buttons, and whatever else you need.

Feedback devices: You can also attach "feedback devices" to the KL25Z. Feedback devices are things that create tactile, sound, and lighting effects in sync with the game action. The most popular PC pinball emulators know how to address a wide variety of these devices, and know how to match them to on-screen action in each virtual table. You just need an I/O controller that translates commands from the PC into electrical signals that turn the devices on and off. The Pinscape Controller can do that for you.

Expansion Boards

There are two main ways to run the Pinscape Controller: standalone, or using the "expansion boards".

In the basic standalone setup, you just need the KL25Z, plus whatever buttons, sensors, and feedback devices you want to attach to it. This mode lets you take advantage of everything the software can do, but for some features, you'll have to build some ad hoc external circuitry to interface external devices with the KL25Z. The Build Guide has detailed plans for exactly what you need to build.

The other option is the Pinscape Expansion Boards. The expansion boards are a companion project, which is also totally free and open-source, that provides Printed Circuit Board (PCB) layouts that are designed specifically to work with the Pinscape software. The PCB designs are in the widely used EAGLE format, which many PCB manufacturers can turn directly into physical boards for you. The expansion boards organize all of the external connections more neatly than on the standalone KL25Z, and they add all of the interface circuitry needed for all of the advanced software functions. The big thing they bring to the table is lots of high-power outputs. The boards provide a modular system that lets you add boards to add more outputs. If you opt for the basic core setup, you'll have enough outputs for all of the toys in a really well-equipped cabinet. If your ambitions go beyond merely well-equipped and run to the ridiculously extravagant, just add an extra board or two. The modular design also means that you can add to the system over time.

Expansion Board project page

Update notes

If you have a Pinscape V1 setup already installed, you should be able to switch to the new version pretty seamlessly. There are just a couple of things to be aware of.

First, the "configuration" procedure is completely different in the new version. Way better and way easier, but it's not what you're used to from V1. In V1, you had to edit the project source code and compile your own custom version of the program. No more! With V2, you simply install the standard, pre-compiled .bin file, and select options using the Pinscape Config Tool on Windows.

Second, if you're using the TSL1410R optical sensor for your plunger, there's a chance you'll need to boost your light source's brightness a little bit. The "shutter speed" is faster in this version, which means that it doesn't spend as much time collecting light per frame as before. The software actually does "auto exposure" adaptation on every frame, so the increased shutter speed really shouldn't bother it, but it does require a certain minimum level of contrast, which requires a certain minimal level of lighting. Check the plunger viewer in the setup tool if you have any problems; if the image looks totally dark, try increasing the light level to see if that helps.

New Features

V2 has numerous new features. Here are some of the highlights...

Dynamic configuration: as explained above, configuration is now handled through the Config Tool on Windows. It's no longer necessary to edit the source code or compile your own modified binary.

Improved plunger sensing: the software now reads the TSL1410R optical sensor about 15x faster than it did before. This allows reading the sensor at full resolution (400dpi), about 400 times per second. The faster frame rate makes a big difference in how accurately we can read the plunger position during the fast motion of a release, which allows for more precise position sensing and faster response. The differences aren't dramatic, since the sensing was already pretty good even with the slower V1 scan rate, but you might notice a little better precision in tricky skill shots.

Keyboard keys: button inputs can now be mapped to keyboard keys. The joystick button option is still available as well, of course. Keyboard keys have the advantage of being closer to universal for PC pinball software: some pinball software can be set up to take joystick input, but nearly all PC pinball emulators can take keyboard input, and nearly all of them use the same key mappings.

Local shift button: one physical button can be designed as the local shift button. This works like a Shift button on a keyboard, but with cabinet buttons. It allows each physical button on the cabinet to have two PC keys assigned, one normal and one shifted. Hold down the local shift button, then press another key, and the other key's shifted key mapping is sent to the PC. The shift button can have a regular key mapping of its own as well, so it can do double duty. The shift feature lets you access more functions without cluttering your cabinet with extra buttons. It's especially nice for less frequently used functions like adjusting the volume or activating night mode.

Night mode: the output controller has a new "night mode" option, which lets you turn off all of your noisy devices with a single button, switch, or PC command. You can designate individual ports as noisy or not. Night mode only disables the noisemakers, so you still get the benefit of your flashers, button lights, and other quiet devices. This lets you play late into the night without disturbing your housemates or neighbors.

Gamma correction: you can designate individual output ports for gamma correction. This adjusts the intensity level of an output to make it match the way the human eye perceives brightness, so that fades and color mixes look more natural in lighting devices. You can apply this to individual ports, so that it only affects ports that actually have lights of some kind attached.

IR Remote Control: the controller software can transmit and/or receive IR remote control commands if you attach appropriate parts (an IR LED to send, an IR sensor chip to receive). This can be used to turn on your TV(s) when the system powers on, if they don't turn on automatically, and for any other functions you can think of requiring IR send/receive capabilities. You can assign IR commands to cabinet buttons, so that pressing a button on your cabinet sends a remote control command from the attached IR LED, and you can have the controller generate virtual key presses on your PC in response to received IR commands. If you have the IR sensor attached, the system can use it to learn commands from your existing remotes.

Yet more USB fixes: I've been gradually finding and fixing USB bugs in the mbed library for months now. This version has all of the fixes of the last couple of releases, of course, plus some new ones. It also has a new "last resort" feature, since there always seems to be "just one more" USB bug. The last resort is that you can tell the device to automatically reboot itself if it loses the USB connection and can't restore it within a given time limit.

More Downloads

  • Custom VP builds: I created modified versions of Visual Pinball 9.9 and Physmod5 that you might want to use in combination with this controller. The modified versions have special handling for plunger calibration specific to the Pinscape Controller, as well as some enhancements to the nudge physics. If you're not using the plunger, you might still want it for the nudge improvements. The modified version also works with any other input controller, so you can get the enhanced nudging effects even if you're using a different plunger/nudge kit. The big change in the modified versions is a "filter" for accelerometer input that's designed to make the response to cabinet nudges more realistic. It also makes the response more subdued than in the standard VP, so it's not to everyone's taste. The downloads include both the updated executables and the source code changes, in case you want to merge the changes into your own custom version(s).

    Note! These features are now standard in the official VP releases, so you don't need my custom builds if you're using 9.9.1 or later and/or VP 10. I don't think there's any reason to use my versions instead of the latest official ones, and in fact I'd encourage you to use the official releases since they're more up to date, but I'm leaving my builds available just in case. In the official versions, look for the checkbox "Enable Nudge Filter" in the Keys preferences dialog. My custom versions don't include that checkbox; they just enable the filter unconditionally.
  • Output circuit shopping list: This is a saved shopping cart at mouser.com with the parts needed to build one copy of the high-power output circuit for the LedWiz emulator feature, for use with the standalone KL25Z (that is, without the expansion boards). The quantities in the cart are for one output channel, so if you want N outputs, simply multiply the quantities by the N, with one exception: you only need one ULN2803 transistor array chip for each eight output circuits. If you're using the expansion boards, you won't need any of this, since the boards provide their own high-power outputs.
  • Cary Owens' optical sensor housing: A 3D-printable design for a housing/mounting bracket for the optical plunger sensor, designed by Cary Owens. This makes it easy to mount the sensor.
  • Lemming77's potentiometer mounting bracket and shooter rod connecter: Sketchup designs for 3D-printable parts for mounting a slide potentiometer as the plunger sensor. These were designed for a particular slide potentiometer that used to be available from an Aliexpress.com seller but is no longer listed. You can probably use this design as a starting point for other similar devices; just check the dimensions before committing the design to plastic.

Copyright and License

The Pinscape firmware is copyright 2014, 2021 by Michael J Roberts. It's released under an MIT open-source license. See License.

Warning to VirtuaPin Kit Owners

This software isn't designed as a replacement for the VirtuaPin plunger kit's firmware. If you bought the VirtuaPin kit, I recommend that you don't install this software. The KL25Z can only run one firmware program at a time, so if you install the Pinscape firmware on your KL25Z, it will replace and erase your existing VirtuaPin proprietary firmware. If you do this, the only way to restore your VirtuaPin firmware is to physically ship the KL25Z back to VirtuaPin and ask them to re-flash it. They don't allow you to do this at home, and they don't even allow you to back up your firmware, since they want to protect their proprietary software from copying. For all of these reasons, if you want to run the Pinscape software, I strongly recommend that you buy a "blank" retail KL25Z to use with Pinscape. They only cost about $15 and are available at several online retailers, including Amazon, Mouser, and eBay. The blank retail boards don't come with any proprietary firmware pre-installed, so installing Pinscape won't delete anything that you paid extra for.

With those warnings in mind, if you're absolutely sure that you don't mind permanently erasing your VirtuaPin firmware, it is at least possible to use Pinscape as a replacement for the VirtuaPin firmware. Pinscape uses the same button wiring conventions as the VirtuaPin setup, so you can keep your buttons (although you'll have to update the GPIO pin mappings in the Config Tool to match your physical wiring). As of the June, 2021 firmware, the Vishay VCNL4010 plunger sensor that comes with the VirtuaPin v3 plunger kit is supported, so you can also keep your plunger, if you have that chip. (You should check to be sure that's the sensor chip you have before committing to this route, if keeping the plunger sensor is important to you. The older VirtuaPin plunger kits came with different IR sensors that the Pinscape software doesn't handle.)

Files at this revision

API Documentation at this revision

Comitter:
mjr
Date:
Thu Mar 23 05:19:05 2017 +0000
Parent:
78:1e00b3fa11af
Child:
80:94dc2946871b
Commit message:
FTFA/Ticker issue fixed (by removing Ticker, changing to Timeout); new "flash write succeeded" status flag; optical plunger rounding improvements

Changed in this revision

FreescaleIAP/FreescaleIAP.cpp Show annotated file Show diff for this revision Revisions of this file
FreescaleIAP/FreescaleIAP.h Show annotated file Show diff for this revision Revisions of this file
FreescaleIAP/IAP.s Show annotated file Show diff for this revision Revisions of this file
NewMalloc/NewMalloc.cpp Show annotated file Show diff for this revision Revisions of this file
NewMalloc/NewMalloc.h Show annotated file Show diff for this revision Revisions of this file
NewMalloc/OpNew.cpp Show annotated file Show diff for this revision Revisions of this file
NewPwm/NewPwm.h Show annotated file Show diff for this revision Revisions of this file
TLC5940/TLC5940.h Show annotated file Show diff for this revision Revisions of this file
USBProtocol.h Show annotated file Show diff for this revision Revisions of this file
ccdSensor.h Show annotated file Show diff for this revision Revisions of this file
main.cpp Show annotated file Show diff for this revision Revisions of this file
mbed_Fixes/KLXX_us_ticker_fix.c Show annotated file Show diff for this revision Revisions of this file
mbed_Fixes/ticker_api_fix.c Show annotated file Show diff for this revision Revisions of this file
nvm.h Show annotated file Show diff for this revision Revisions of this file
--- a/FreescaleIAP/FreescaleIAP.cpp	Sun Mar 19 05:30:53 2017 +0000
+++ b/FreescaleIAP/FreescaleIAP.cpp	Thu Mar 23 05:19:05 2017 +0000
@@ -1,67 +1,55 @@
-// FreescaleIAP, private version
+// FreescaleIAP - custom version
 //
-// This is a heavily modified version of Erik Olieman's FreescaleIAP, a
-// flash memory writer for Freescale boards.  This version is adapted to
-// the special needs of the KL25Z.
-//
-// Simplifications:
+// This is a simplified version of Erik Olieman's FreescaleIAP, a flash 
+// memory writer for Freescale boards.  This version combines erase, write,
+// and verify into a single API call.  The caller only has to give us a
+// buffer (of any length) to write, and the address to write it to, and
+// we'll do the whole thing - essentially a memcpy() to flash.
 //
-// Unlike EO's original version, this version combines erase and write
-// into a single opreation, so the caller can simply give us a buffer
-// and a location, and we'll write it, including the erase prep.  We
-// don't need to be able to separate the operations, so the combined
-// interface is simpler at the API level and also lets us do all of the
-// interrupt masking in one place (see below).
-//
-// Stability improvements:
+// This version uses an assembler implementation of the core code that
+// launches an FTFA command and waits for completion, to minimize the
+// size of the code and to ensure that it's placed in RAM.  The KL25Z
+// flash controller prohibits any flash reads while an FTFA command is
+// executing.  This includes instruction fetches; any instruction fetch
+// from flash while an FTFA command is running will fail, which will 
+// freeze the CPU.  Placing the execute/wait code in RAM ensures that
+// the wait loop itself won't trigger a fetch.  It's also vital to disable
+// interrupts while the execute/wait code is running, to ensure that we
+// don't jump to an ISR in flash during the wait.
 //
-// The KL25Z has an important restriction on flash writing that makes it
-// very delicate.  Specifically, the flash controller (FTFA) doesn't allow 
-// any read operations while a sector erase is in progress.  This complicates
-// things for a KL25Z app because all program code is stored in flash by 
-// default.  This means that every instruction fetch is a flash read.  The
-// FTFA's response to a read while an erase is in progress is to fail the
-// read.  When the read is actually an instruction fetch, this results in
-// CPU lockup.  Making this even more complicated, the erase operation can
-// only operate on a whole sector at a time, which takes on the order of 
-// milliseconds, which is a very long time for the CPU to go without any
-// instruction fetches.  Even if the code that initiates the erase is 
-// located in RAM and is very careful to loop within the RAM code block,
-// any interrupt could take us out of the RAM loop and trigger a fetch
-// on a flash location.
-//
-// We use two strategies to avoid flash fetches while we're working.
-// First, the code that performs all of the FTFA operations is written
-// in assembly, in a module AREA marked READWRITE.  This forces the
-// linker to put the code in RAM.  The code could otherwise just have
-// well been written in C++, but as far as I know there's no way to tell
-// the mbed C++ compiler to put code in RAM.  Since the FTFA code is all
-// in RAM, it doesn't by itself trigger any flash fetches as it executes,
-// so we're left with interrupts as the only concern.  Second, we explicitly 
-// disable all of the peripheral interrupts that we use anywhere in the 
-// program (USB, all the timers, GPIO ports, etc) via the NVIC.  From
-// testing, it's clear that disabling interrupts at the CPU level via
-// __disable_irq() (or the equivalent assembly instruction CPSID I) isn't
-// enough.  We have to turn interrupts off at the peripheral (NVIC) level.
-// I'm really not sure why this is required, since you'd think the CPSID I
-// masking would be enough, but experimentally it's clearly not.  This is
-// a detail of ARM hardware architecture that I need to look into more,
-// since it leaves me uneasy that there might be even more subtleties 
-// left to uncover.   But at least things seem very stable after blocking
-// interrupts at the NVIC level.
+// Despite the dire warnings in the hardware reference manual about putting
+// the FTFA execute/wait code in RAM, it doesn't actually appear to be
+// necessary, as long as the wait loop is very small (in terms of machine
+// code instruction count).  In testing, Erik has found that a flash-resident
+// version of the code is stable, and further found (by testing combinations
+// of cache control settings via the platform control register, MCM_PLACR)
+// that the stability comes from the loop fitting into CPU cache, which
+// allows the loop to execute without any fetches taking place.  Even so,
+// I'm keeping the RAM version, out of an abundance of caution: just in
+// case there are any rare or oddball conditions (interrupt timing, say) 
+// where the cache trick breaks.  Putting the code in RAM seems pretty
+// much guaranteed to work, whereas the cache trick seems somewhat to be
+// relying on a happy accident, and I personally don't know the M0+ 
+// architecture well enough to be able to convince myself that it really
+// will work under all conditions.  There doesn't seem to be any benefit
+// to not using the assembler, either, as it's very simple code and takes
+// up little RAM (about 40 bytes).
+
 
 #include "FreescaleIAP.h"
- 
+
 //#define IAPDEBUG
 
 // assembly interface
 extern "C" {
-    void iapEraseSector(FTFA_Type *ftfa, uint32_t address);
-    void iapProgramBlock(FTFA_Type *ftfa, uint32_t address, const void *src, uint32_t length);
+    // Execute the current FTFA command and wait for completion.
+    // This is an assembler implementation that runs entirely in RAM,
+    // to ensure strict compliance with the prohibition on reading
+    // flash (for instruction fetches or any other reason) during FTFA 
+    // execution.
+    void iapExecAndWait();
 }
 
-
- 
 enum FCMD {
     Read1s = 0x01,
     ProgramCheck = 0x02,
@@ -75,116 +63,47 @@
     VerifyBackdoor = 0x45
 };
 
-
-/* Check if an error occured 
-   Returns error code or Success*/
-static IAPCode check_error(void) 
-{
-    if (FTFA->FSTAT & FTFA_FSTAT_FPVIOL_MASK) {
-        #ifdef IAPDEBUG
-        printf("IAP: Protection violation\r\n");
-        #endif
-        return ProtectionError;
-    }
-    if (FTFA->FSTAT & FTFA_FSTAT_ACCERR_MASK) {
-        #ifdef IAPDEBUG
-        printf("IAP: Flash access error\r\n");
-        #endif
-        return AccessError;
-    }
-    if (FTFA->FSTAT & FTFA_FSTAT_RDCOLERR_MASK) {
-        #ifdef IAPDEBUG
-        printf("IAP: Collision error\r\n");
-        #endif
-        return CollisionError;
-    }
-    if (FTFA->FSTAT & FTFA_FSTAT_MGSTAT0_MASK) {
-        #ifdef IAPDEBUG
-        printf("IAP: Runtime error\r\n");
-        #endif
-        return RuntimeError;
-    }
-    #ifdef IAPDEBUG
-    printf("IAP: No error reported\r\n");
-    #endif
-    return Success;
-}
- 
-IAPCode FreescaleIAP::program_flash(int address, const void *src, unsigned int length) 
-{    
-    #ifdef IAPDEBUG
-    printf("IAP: Programming flash at %x with length %d\r\n", address, length);
-    #endif
-                
-    // presume success
-    IAPCode status = Success;
-
-    // I'm not 100% convinced this is 100% reliable yet.  So let's show
-    // some diagnostic lights while we're working.  If anyone sees any
-    // freezes, the lights that are left on at the freeze will tell us
-    // which step is crashing.
-    extern void diagLED(int,int,int);
-    
-    // Erase the sector(s) covered by the write.  Before writing, we must
-    // erase each sector that we're going to touch on the write.
-    for (uint32_t ofs = 0 ; ofs < length ; ofs += SECTOR_SIZE)
-    {
-        // Show RED on the first sector, GREEN on second, BLUE on third.  Each
-        // sector is 1K, so I don't think we'll need more than 3 for the 
-        // foreseeable future.  (RAM on the KL25Z is so tight that it will
-        // probably stop us from adding enough features to require more
-        // configuration variables than 3K worth.)
-        diagLED(ofs/SECTOR_SIZE == 0, ofs/SECTOR_SIZE == 1, ofs/SECTOR_SIZE == 2);
-        
-        // erase the sector
-        iapEraseSector(FTFA, address + ofs);
-    }
-        
-    // If the erase was successful, write the data.
-    if ((status = check_error()) == Success)
-    {
-        // show cyan while the write is in progress
-        diagLED(0, 1, 1);
-
-        // do the write
-        iapProgramBlock(FTFA, address, src, length);
-        
-        // purple when done
-        diagLED(1, 0, 1);
-        
-        // check again for errors
-        status = check_error();
-    }
-    
-    // return the result
-    return status;
-}
- 
-uint32_t FreescaleIAP::flash_size(void) 
+// Get the size of the flash memory on the device
+uint32_t FreescaleIAP::flashSize(void) 
 {
     uint32_t retval = (SIM->FCFG2 & 0x7F000000u) >> (24-13);
     if (SIM->FCFG2 & (1<<23))           // Possible second flash bank
         retval += (SIM->FCFG2 & 0x007F0000u) >> (16-13);
     return retval;
 }
- 
-/* Check if no flash boundary is violated
-   Returns true on violation */
-bool check_boundary(int address, unsigned int length) 
+
+// Check if an error occurred
+static FreescaleIAP::IAPCode checkError(void) 
 {
-    int temp = (address+length - 1) / SECTOR_SIZE;
-    address /= SECTOR_SIZE;
-    bool retval = (address != temp);
-    #ifdef IAPDEBUG
-    if (retval)
-        printf("IAP: Boundary violation\r\n");
-    #endif
-    return retval;
+    if (FTFA->FSTAT & FTFA_FSTAT_FPVIOL_MASK) {
+        #ifdef IAPDEBUG
+        printf("IAP: Protection violation\r\n");
+        #endif
+        return FreescaleIAP::ProtectionError;
+    }
+    if (FTFA->FSTAT & FTFA_FSTAT_ACCERR_MASK) {
+        #ifdef IAPDEBUG
+        printf("IAP: Flash access error\r\n");
+        #endif
+        return FreescaleIAP::AccessError;
+    }
+    if (FTFA->FSTAT & FTFA_FSTAT_RDCOLERR_MASK) {
+        #ifdef IAPDEBUG
+        printf("IAP: Collision error\r\n");
+        #endif
+        return FreescaleIAP::CollisionError;
+    }
+    if (FTFA->FSTAT & FTFA_FSTAT_MGSTAT0_MASK) {
+        #ifdef IAPDEBUG
+        printf("IAP: Runtime error\r\n");
+        #endif
+        return FreescaleIAP::RuntimeError;
+    }
+    return FreescaleIAP::Success;
 }
- 
-/* Check if address is correctly aligned
-   Returns true on violation */
-bool check_align(int address) 
+
+// check for proper address alignment
+static bool checkAlign(int address) 
 {
     bool retval = address & 0x03;
     #ifdef IAPDEBUG
@@ -193,4 +112,190 @@
     #endif
     return retval;
 }
- 
+
+// clear errors in the FTFA
+static void clearErrors()
+{
+    // wait for any previous command to complete    
+    while (!(FTFA->FSTAT & FTFA_FSTAT_CCIF_MASK)) ;
+
+    // clear the error bits
+    if (FTFA->FSTAT & (FTFA_FSTAT_ACCERR_MASK | FTFA_FSTAT_FPVIOL_MASK))
+        FTFA->FSTAT |= FTFA_FSTAT_ACCERR_MASK | FTFA_FSTAT_FPVIOL_MASK;
+}
+
+static FreescaleIAP::IAPCode eraseSector(int address) 
+{
+    #ifdef IAPDEBUG
+    printf("IAP: Erasing sector at %x\r\n", address);
+    #endif
+
+    // ensure proper alignment
+    if (checkAlign(address))
+        return FreescaleIAP::AlignError;
+    
+    // clear errors
+    clearErrors();
+    
+    // Set up the command
+    FTFA->FCCOB0 = EraseSector;
+    FTFA->FCCOB1 = (address >> 16) & 0xFF;
+    FTFA->FCCOB2 = (address >> 8) & 0xFF;
+    FTFA->FCCOB3 = address & 0xFF;
+    
+    // execute
+    iapExecAndWait();
+    
+    // check the result
+    return checkError();
+}
+
+static FreescaleIAP::IAPCode verifySectorErased(int address)
+{
+    // Always verify in whole sectors.  The
+    const unsigned int count = SECTOR_SIZE/4;
+
+    #ifdef IAPDEBUG
+    printf("IAP: Verify erased at %x, %d longwords (%d bytes)\r\n", address, count, count*4);
+    #endif
+    
+    if (checkAlign(address))
+        return FreescaleIAP::AlignError;
+
+    // clear errors
+    clearErrors();
+    
+    // Set up command
+    FTFA->FCCOB0 = Read1s;
+    FTFA->FCCOB1 = (address >> 16) & 0xFF;
+    FTFA->FCCOB2 = (address >> 8) & 0xFF;
+    FTFA->FCCOB3 = address & 0xFF;
+    FTFA->FCCOB4 = (count >> 8) & 0xFF;
+    FTFA->FCCOB5 = count & 0xFF;
+    FTFA->FCCOB6 = 0;
+
+    // execute    
+    iapExecAndWait();
+    
+    // check the result
+    FreescaleIAP::IAPCode retval = checkError();
+    if (retval == FreescaleIAP::RuntimeError) {
+        #ifdef IAPDEBUG
+        printf("IAP: Flash was not erased\r\n");
+        #endif
+        return FreescaleIAP::EraseError;
+    }
+    return retval;       
+}
+
+// Write one sector.  This always writes a full sector, even if the
+// requested length is greater or less than the sector size:
+//
+// - if len > SECTOR_SIZE, we write the first SECTOR_SIZE bytes of the data
+//
+// - if len < SECTOR_SIZE, we write the data, then fill in the rest of the
+//   sector with 0xFF bytes ('1' bits)
+//
+
+static FreescaleIAP::IAPCode writeSector(int address, const uint8_t *p, int len)
+{    
+    #ifdef IAPDEBUG
+    printf("IAP: Writing sector at %x with length %d\r\n", address, len);
+    #endif
+
+    // program the sector, one longword (32 bits) at a time
+    for (int ofs = 0 ; ofs < SECTOR_SIZE ; ofs += 4, address += 4, p += 4, len -= 4)
+    {
+        // clear errors
+        clearErrors();
+        
+        // Set up the command
+        FTFA->FCCOB0 = ProgramLongword;
+        FTFA->FCCOB1 = (address >> 16) & 0xFF;
+        FTFA->FCCOB2 = (address >> 8) & 0xFF;
+        FTFA->FCCOB3 = address & 0xFF;
+        
+        // Load the longword to write.  If we're past the end of the source
+        // data, write all '1' bits to the balance of the sector.
+        FTFA->FCCOB4 = len > 3 ? p[3] : 0xFF;
+        FTFA->FCCOB5 = len > 2 ? p[2] : 0xFF;
+        FTFA->FCCOB6 = len > 1 ? p[1] : 0xFF;
+        FTFA->FCCOB7 = len > 0 ? p[0] : 0xFF;
+        
+        // execute
+        iapExecAndWait();
+        
+        // check errors
+        FreescaleIAP::IAPCode status = checkError();
+        if (status != FreescaleIAP::Success)
+            return status;
+    }
+    
+    // no problems
+    return FreescaleIAP::Success;
+}
+
+// Program a block of memory into flash. 
+FreescaleIAP::IAPCode FreescaleIAP::programFlash(
+    int address, const void *src, unsigned int length) 
+{    
+    #ifdef IAPDEBUG
+    printf("IAP: Programming flash at %x with length %d\r\n", address, length);
+    #endif
+    
+    // presume success
+    FreescaleIAP::IAPCode status = FreescaleIAP::Success;
+    
+    // Show diagnostic LED colors while writing.  I'm finally convinced this
+    // is well and truly 100% reliable now, but I've been wrong before, so
+    // we'll keep this for now.  The idea is that if we freeze up, we'll at
+    // least know which stage we're at from the last color displayed.
+    extern void diagLED(int,int,int);
+    
+    // try a few times if we fail to verify
+    for (int tries = 0 ; tries < 5 ; ++tries)
+    {
+        // Do the write one sector at a time
+        int curaddr = address;
+        const uint8_t *p = (const uint8_t *)src;
+        int rem = (int)length;
+        for ( ; rem > 0 ; curaddr += SECTOR_SIZE, p += SECTOR_SIZE, rem -= SECTOR_SIZE)
+        {
+            // erase the sector (red LED)
+            diagLED(1, 0, 0);
+            if ((status = eraseSector(curaddr)) != FreescaleIAP::Success)
+                break;
+            
+            // verify that the sector is erased (yellow LED)
+            diagLED(1, 1, 0);
+            if ((status = verifySectorErased(curaddr)) != FreescaleIAP::Success)
+                break;
+            
+            // write the data (white LED)
+            diagLED(1, 1, 1);
+            if ((status = writeSector(curaddr, p, rem)) != FreescaleIAP::Success)
+                break;
+                
+            // back from write (purple LED)
+            diagLED(1, 0, 1);
+        }
+        
+        // if we didn't encounter an FTFA error, verify the write
+        if (status == FreescaleIAP::Success)
+        {
+            // Verify the write.  If it was successful, we're done.
+            if (memcmp((void *)address, src, length) == 0)
+                break;
+                
+            // We have a mismatch between the flash data and the source.
+            // Flag the error and go back for another attempt.
+            status = FreescaleIAP::VerifyError;
+        }
+    }
+    
+    __enable_irq();
+        
+    // return the result
+    return status;
+}
+
--- a/FreescaleIAP/FreescaleIAP.h	Sun Mar 19 05:30:53 2017 +0000
+++ b/FreescaleIAP/FreescaleIAP.h	Thu Mar 23 05:19:05 2017 +0000
@@ -6,23 +6,16 @@
 #include "mbed.h"
 #include "FreescaleIAP.h"
   
-int main() {
-    int address = flash_size() - SECTOR_SIZE;           //Write in last sector
+int main() 
+{
+    int address = flashSize() - SECTOR_SIZE;           //Write in last sector
     
     int *data = (int*)address;
     printf("Starting\r\n"); 
     erase_sector(address);
     int numbers[10] = {0, 1, 10, 100, 1000, 10000, 1000000, 10000000, 100000000, 1000000000};
-    program_flash(address, (char*)&numbers, 40);        //10 integers of 4 bytes each: 40 bytes length
-    printf("Resulting flash: \r\n");
-    for (int i = 0; i<10; i++)
-        printf("%d\r\n", data[i]);
-    
-    printf("Done\r\n\n");
-        
- 
-    while (true) {
-    }
+    programFlash(address, (char*)&numbers, 40);        //10 integers of 4 bytes each: 40 bytes length
+    while (true) ;
 }
 
 */
@@ -42,22 +35,22 @@
 #define SECTOR_SIZE     1024
 #endif
 
-enum IAPCode {
-    BoundaryError = -99,    //Commands may not span several sectors
-    AlignError,             //Data must be aligned on longword (two LSBs zero)
-    ProtectionError,        //Flash sector is protected
-    AccessError,            //Something went wrong
-    CollisionError,         //During writing something tried to flash which was written to
-    LengthError,            //The length must be multiples of 4
-    RuntimeError,           
-    EraseError,             //The flash was not erased before writing to it
-    Success = 0
-};
- 
-
 class FreescaleIAP
 {
 public:
+    enum IAPCode {
+        BoundaryError = -99,    // Commands may not span several sectors
+        AlignError,             // Data must be aligned on longword (two LSBs zero)
+        ProtectionError,        // Flash sector is protected
+        AccessError,            // Something went wrong
+        CollisionError,         // During writing something tried to flash which was written to
+        LengthError,            // The length must be multiples of 4
+        RuntimeError,           // FTFA runtime error reports
+        EraseError,             // The flash was not erased before writing to it
+        VerifyError,            // The data read back from flash didn't match what we wrote
+        Success = 0
+    };
+
     FreescaleIAP() { }
     ~FreescaleIAP() { }
  
@@ -68,7 +61,7 @@
      * @param length number of bytes to program (must be a multiple of 4)
      * @param return Success if no errors were encountered, otherwise one of the error states
      */
-    IAPCode program_flash(int address, const void *data, unsigned int length);
+    IAPCode programFlash(int address, const void *data, unsigned int length);
      
     /**
      * Returns size of flash memory
@@ -77,7 +70,7 @@
      *
      * @param return length of flash memory in bytes
      */
-    uint32_t flash_size(void);
+    uint32_t flashSize(void);
     
 private:
     // program a word of flash
--- a/FreescaleIAP/IAP.s	Sun Mar 19 05:30:53 2017 +0000
+++ b/FreescaleIAP/IAP.s	Thu Mar 23 05:19:05 2017 +0000
@@ -1,217 +1,60 @@
 ; FreescaleIAP assembly functions
 ;
-    AREA iap_main_asm_code, CODE, READONLY
-    
-;---------------------------------------------------------------------------
-; iapEraseSector(FTFA_Type *FTFA, uint32_t address)
-;   R0 = FTFA pointer
-;   R1 = starting address
-
-    EXPORT iapEraseSector
-iapEraseSector
-    ; save registers
-    STMFD   R13!,{R1,R4,LR}
-    
-    ; wait for any previous command to complete
-    BL      iapWait
-    
-    ; clear any errors
-    BL      iapClearErrors
-    
-    ; set up the command parameters
-    MOVS    R4,#0
-    STRB    R4,[R0,#1]   ; FTFA->FCNFG <- 0
-    MOVS    R4,#9        ; command = erase sector (9)
-    STRB    R4,[R0,#7]   ; FTFA->FCCOB0 <- command
-    
-    STRB    R1,[R0,#4]   ; FTFA->FCCOB3 <- address bits 16-23
-    
-    MOVS    R1,R1,LSR #8 ; address >>= 8
-    STRB    R1,[R0,#5]   ; FTFA->FCCOB2 <- address bits 8-15
-    
-    MOVS    R1,R1,LSR #8 ; address >>= 8
-    STRB    R1,[R0,#6]   ; FTFA->FCCOB1 <- address bits 0-7
-    
-    ; execute (and wait for completion)
-    BL      iapExecAndWait
-    
-    ; pop registers and return
-    LDMFD   R13!,{R1,R4,PC}    
-
-;---------------------------------------------------------------------------
-; iapProgramBlock(TFA_Type *ftfa, uint32_t address, const void *src, uint32_t length)
-;   R0 = FTFA pointer
-;   R1 = flash address
-;   R2 = source data pointer
-;   R3 = data length in bytes
-
-    EXPORT iapProgramBlock
-iapProgramBlock
-    ; save registers
-    STMFD   R13!, {R1,R2,R3,R4,LR}
-    
-    ; wait for any previous command to complete
-    BL      iapWait
-    
-    ; iterate over the data
-LpLoop
-    CMPS    R3,#3        ; at least one longword left (>= 4 bytes)?
-    BLS     LpDone       ; no, done
-    
-    ; clear any errors from the previous command
-    BL      iapClearErrors
-    
-    ; set up the command parameters
-    MOVS    R4,#0
-    STRB    R4,[R0,#1]   ; FTFA->FCNFG <- 0
-    MOVS    R4,#6        ; command = program longword (6)
-    STRB    R4,[R0,#7]   ; FTFA->FCCOB0 <- command
-    
-    MOVS    R4,R1        ; R4 <- current address
-    STRB    R4,[R0,#4]   ; FTFA->FCCOB3 <- address bits 16-23
-    
-    MOVS    R4,R4,LSR #8 ; address >>= 8
-    STRB    R4,[R0,#5]   ; FTFA->FCCOB2 <- address bits 8-15
-
-    MOVS    R4,R4,LSR #8 ; address >>= 8    
-    STRB    R4,[R0,#6]   ; FTFA->FCCOB1 <- address bits 0-7
-    
-    LDRB    R4,[R2]      ; R4 <- data[0]
-    STRB    R4,[R0,#8]   ; FTFA->FCCOB7 <- data[0]
-    
-    LDRB    R4,[R2,#1]   ; R4 <- data[1]
-    STRB    R4,[R0,#9]   ; FTFA->FCCOB6 <- data[1]
-    
-    LDRB    R4,[R2,#2]   ; R4 <- data[2]
-    STRB    R4,[R0,#0xA] ; FTFA->FCCOB5 <- data[2]
-    
-    LDRB    R4,[R2,#3]   ; R4 <- data[3]
-    STRB    R4,[R0,#0xB] ; FTBA->FCCOB4 <- data[3]
-    
-    ; execute the command
-    BL      iapExecAndWait
-    
-    ; advance to the next longword
-    ADDS    R1,R1,#4     ; flash address += 4
-    ADDS    R2,R2,#4     ; source data pointer += 4
-    SUBS    R3,R3,#4     ; data length -= 4
-    B       LpLoop       ; back for the next iteration
-    
-LpDone    
-    ; pop registers and return
-    LDMFD   R13!, {R1,R2,R3,R4,PC}    
-
-    
-;---------------------------------------------------------------------------
-; iapClearErrors(FTFA_Type *FTFA) - clear errors from previous command
-;   R0 = FTFA pointer
-
-iapClearErrors
-    ; save registers
-    STMFD   R13!, {R2,R3,LR}
-
-    LDRB    R2, [R0]    ; R2 <- FTFA->FSTAT
-    MOVS    R3, #0x30   ; FPVIOL (0x10) | ACCERR (0x20)
-    ANDS    R2, R2, R3  ; R2 &= error bits
-    BEQ     Lc0         ; if all zeros, no need to reset anything
-    STRB    R2, [R0]    ; write the 1 bits back to clear the error status
-Lc0
-    ; restore registers and return
-    LDMFD   R13!, {R2,R3,PC}    
+; The hardware manual warns that FTFA commands must be executed entirely
+; from RAM code, since we can't have any flash reads occur while an erase 
+; or write operation is executing.  If the code executing and waiting for
+; the FTFA command were in flash, the CPU might have to fetch an instruction
+; from flash in the course of the loop, which could freeze the CPU.  
+; Empirically, it seems that this isn't truly necessary, despite the manual's
+; warnings.  The M0+ instruction cache is big enough to hold the whole
+; execute and wait loop instruction sequence, even when written in C++, so
+; in practice this can run as flash-resident C++ code.  We're implementing
+; it as assembler anyway to follow the best practices as laid out in the
+; hardware manual.
+;
+; Tell the linker to put our code in RAM by making it read-write.
+    AREA iap_ram_asm_code, CODE, READWRITE
 
 
-;---------------------------------------------------------------------------
-; iapWait(FTFA_Type *FTFA) - wait for command to complete
-;   R0 = FTFA pointer
-
-iapWait
-    ; save registers
-    STMFD   R13!, {R1,R2,LR}
-
-    ; the CCIF bit is SET when the command completes
-Lw0
-    LDRB    R1, [R0]     ; R1 <- FTFA->FSTAT
-    MOVS    R2, #0x80    ; CCIF (0x80)
-    TSTS    R1, R2       ; test R1 & CCIF
-    BEQ     Lw0          ; if zero, the command is still running
-
-    ; pop registers and return
-    LDMFD   R13!, {R1,R2,PC}    
-
-
-;---------------------------------------------------------------------------
+; iapExecAndWait()
 ;
-; The iapExecAndWait function MUST NOT BE IN FLASH, since we can't have
-; any flash reads occur while an erase or write operation is executing.  If
-; the code were in flash, the CPU might have to fetch an instruction from
-; flash in the course of the loop, which could freeze the CPU.  Force the
-; linker to put this section in RAM by making it read-write.
-
-    AREA iap_ram_asm_code, CODE, READWRITE
-
-;---------------------------------------------------------------------------
-;
-; iapExecAndWait(FTFA_Type *FTFA)
-;   R0 = FTFA pointer
-;
-; This sets the bit in the FTFA status register to launch execution
-; of the command currently configured in the control registers.  The
-; caller must set up the control registers with the command code, and
-; any address data parameters requied for the command.  After launching
-; the command, we loop until the FTFA signals command completion.
-;
-; This routine turns off CPU interrupts and disables all peripheral
-; interrupts through the NVIC while the command is executing.  That
-; should eliminate any possibility of a hardware interrupt triggering
-; a flash fetch during a programming operation.  We restore interrupts
-; on return.  The caller doesn't need to (and shouldn't) do its own
-; interrupt manipulation.  In testing, it seems problematic to leave
-; interrupts disabled for long periods, so the safest approach seems
-; to be to disable the interrupts only for the actual command execution.
+; Launches the currently loaded FTFA command and waits for completion.
+; Before calling, the caller must set up the FTFA command registers with 
+; the command code and any address and data parameters required.  The
+; caller should also disable interrupts, since an interrupt handler could
+; cause a branch into code resident in flash memory, which would violate
+; the rule against accessing flash while an FTFA command is running.
 
     EXPORT iapExecAndWait
 iapExecAndWait
     ; save registers
-    STMFD   R13!, {R1,R2,R3,R4,LR}
+    STMFD   R13!, {R1,R2,LR}
     
-    ; disable all interrupts in the NVIC
-    LDR     R3, =NVIC_ICER ; R3 <- NVIC_ICER
-    LDR     R4, [R3]     ; R4 <- current interrupt status
-    MOVS    R2, #0       ; R2 <- 0
-    SUBS    R2,R2,#1     ; R2 <- 0 - 1 = 0xFFFFFFFF
-    STR     R2, [R3]     ; [NVIC_ICER] <- 0xFFFFFFFF (disable all interrupts)
-
-    ; disable CPU interrupts
-    CPSID I              ; interrupts off
-    DMB                  ; data memory barrier
+    ; disable interrupts
+    CPSID   I            ; set the PRIMASK to disable interrupts
     DSB                  ; data synchronization barrier
     ISB                  ; instruction synchronization barrier
-
+    
     ; Launch the command by writing the CCIF bit to FTFA_FSTAT    
-    MOVS    R1, #0x80    ; CCIF (0x80)
-    STRB    R1, [R0]     ; FTFA->FSTAT = CCIF
+    LDR     R0, FTFA_FSTAT
+    MOVS    R2, #0x80    ; CCIF (0x80)
+    STRB    R2, [R0]     ; FTFA->FSTAT = CCIF
     
     ; Wait for the command to complete.  The FTFA sets the CCIF
     ; bit in FTFA_FSTAT when the command is finished, so spin until
     ; the bit reads as set.
 Lew0
     LDRB    R1, [R0]     ; R1 <- FTFA->FSTAT
-    MOVS    R2, #0x80    ; CCIF (0x80)
     TSTS    R1, R2       ; test R1 & CCIF
     BEQ     Lew0         ; if zero, the command is still running
     
-    ; restore CPU interrupts
-    CPSIE I
-
-    ; re-enable NVIC interrupts
-    LDR     R3, =NVIC_ISER ; R3 <- NVIC_ISER
-    STR     R4, [R3]     ; NVIC_ISER = old interrupt enable vector
+    ; re-enable interrupts
+    CPSIE   I
     
     ; pop registers and return
-    LDMFD   R13!, {R1,R2,R3,R4,PC}
+    LDMFD   R13!, {R1,R2,PC}
 
     ALIGN
-NVIC_ISER  DCD 0xE000E100
-NVIC_ICER  DCD 0xE000E180
+FTFA_FSTAT DCD 0x40020000
 
     END
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/NewMalloc/NewMalloc.cpp	Thu Mar 23 05:19:05 2017 +0000
@@ -0,0 +1,126 @@
+#include "mbed.h"
+#include "NewMalloc.h"
+
+extern void diagLED(int, int, int);
+
+// Custom memory allocator.  We use our own version of malloc() for more
+// efficient memory usage, and to provide diagnostics if we run out of heap.
+//
+// We can implement a more efficient malloc than the library can because we
+// can make an assumption that the library can't: allocations are permanent.
+// The normal malloc has to assume that allocations can be freed, so it has
+// to track blocks individually.  For the purposes of this program, though,
+// we don't have to do this because virtually all of our allocations are 
+// de facto permanent.  We only allocate dyanmic memory during setup, and 
+// once we set things up, we never delete anything.  This means that we can 
+// allocate memory in bare blocks without any bookkeeping overhead.
+//
+// In addition, we can make a larger overall pool of memory available in
+// a custom allocator.  The RTL malloc() seems to have a pool of about 3K 
+// to work with, even though there really seems to be at least 8K left after 
+// reserving a reasonable amount of space for the stack.
+
+// halt with a diagnostic display if we run out of memory
+void HaltOutOfMem()
+{
+    printf("\r\nOut Of Memory\r\n");
+    // halt with the diagnostic display (by looping forever)
+    for (;;)
+    {
+        diagLED(1, 0, 0);
+        wait_us(200000);
+        diagLED(1, 0, 1);
+        wait_us(200000);
+    }
+}
+
+// For our custom malloc, we take advantage of the known layout of the
+// mbed library memory management.  The mbed library puts all of the
+// static read/write data at the low end of RAM; this includes the
+// initialized statics and the "ZI" (zero-initialized) statics.  The
+// malloc heap starts just after the last static, growing upwards as
+// memory is allocated.  The stack starts at the top of RAM and grows
+// downwards.  
+//
+// To figure out where the free memory starts, we simply call the system
+// malloc() to make a dummy allocation the first time we're called, and 
+// use the address it returns as the start of our free memory pool.  The
+// first malloc() call presumably returns the lowest byte of the pool in
+// the compiler RTL's way of thinking, and from what we know about the
+// mbed heap layout, we know everything above this point should be free,
+// at least until we reach the lowest address used by the stack.
+//
+// The ultimate size of the stack is of course dynamic and unpredictable.
+// In testing, it appears that we currently need a little over 1K.  To be
+// conservative, we'll reserve 2K for the stack, by taking it out of the
+// space at top of memory we consider fair game for malloc.
+//
+// Note that we could do this a little more low-level-ly if we wanted.
+// The ARM linker provides a pre-defined extern char[] variable named 
+// Image$$RW_IRAM1$$ZI$$Limit, which is always placed just after the
+// last static data variable.  In principle, this tells us the start
+// of the available malloc pool.  However, in testing, it doesn't seem
+// safe to use this as the start of our malloc pool.  I'm not sure why,
+// but probably something in the startup code (either in the C RTL or 
+// the mbed library) is allocating from the pool before we get control. 
+// So we won't use that approach.  Besides, that would tie us even more
+// closely to the ARM compiler.  With our malloc() probe approach, we're
+// at least portable to any compiler that uses the same basic memory
+// layout, with the heap above the statics and the stack at top of 
+// memory; this isn't universal, but it's very typical.
+
+extern "C" {
+    void *$Sub$$malloc(size_t);
+    void *$Super$$malloc(size_t);
+    void $Sub$$free(void *);
+};
+
+// override the system malloc
+void *$Sub$$malloc(size_t siz)
+{
+    return xmalloc(siz);
+}
+
+// custom allocator pool
+static char *xmalloc_nxt = 0;
+size_t xmalloc_rem = 0;
+
+// custom allocator
+void *xmalloc(size_t siz)
+{
+    // initialize the pool if we haven't already
+    if (xmalloc_nxt == 0)
+    {
+        // do a dummy allocation with the system malloc() to find where
+        // the free pool starts
+        xmalloc_nxt = (char *)$Super$$malloc(4);
+        
+        // figure the amount of space we can use - we have from the base
+        // of the pool to the top of RAM, minus an allowance for the stack
+        const uint32_t TopOfRAM = 0x20003000UL;
+        const uint32_t StackSize = 2*1024;
+        xmalloc_rem = TopOfRAM - StackSize - uint32_t(xmalloc_nxt);
+    }
+    
+    // align to a dword boundary
+    siz = (siz + 3) & ~3;
+    
+    // make sure we have enough space left for this chunk
+    if (siz > xmalloc_rem)
+        HaltOutOfMem();
+        
+    // carve the chunk out of the remaining free pool
+    char *ret = xmalloc_nxt;
+    xmalloc_nxt += siz;
+    xmalloc_rem -= siz;
+    
+    // return the allocated space
+    return ret;
+}
+
+// Remaining free memory
+size_t mallocBytesFree() 
+{
+    return xmalloc_rem;
+}
+
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/NewMalloc/NewMalloc.h	Thu Mar 23 05:19:05 2017 +0000
@@ -0,0 +1,12 @@
+#ifndef _NEWMALLOC_H_
+#define _NEWMALLOC_H_
+
+#include "mbed.h"
+
+// our custom memory allocator
+void *xmalloc(size_t);
+
+// Number of free bytes remaning
+size_t mallocBytesFree();
+
+#endif
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/NewMalloc/OpNew.cpp	Thu Mar 23 05:19:05 2017 +0000
@@ -0,0 +1,13 @@
+#include "NewMalloc.h"
+
+// Overload operator new to call our custom malloc.  This ensures that
+// all 'new' allocations throughout the program (including library code)
+// go through our private allocator.
+void *operator new(size_t siz) { return xmalloc(siz); }
+void *operator new[](size_t siz) { return xmalloc(siz); }
+
+// Since we don't do bookkeeping to track released memory, 'delete' does
+// nothing.  In actual testing, this routine appears to never be called.
+// If it *is* ever called, it will simply leave the block in place, which
+// will make it unavailable for re-use but will otherwise be harmless.
+void operator delete(void *ptr) { }
--- a/NewPwm/NewPwm.h	Sun Mar 19 05:30:53 2017 +0000
+++ b/NewPwm/NewPwm.h	Thu Mar 23 05:19:05 2017 +0000
@@ -238,6 +238,16 @@
         tpm->SC = TPM_SC_CMOD(1) | TPM_SC_PS(ps);
     }
     
+    // wait for the end of the current cycle
+    void waitEndCycle()
+    {
+        // clear the overflow flag
+        tpm->SC |= TPM_SC_TOF_MASK;
+        
+        // The flag will be set at the next overflow
+        while (!(tpm->SC & TPM_SC_TOF_MASK)) ;
+    }
+    
     // hardware register base
     TPM_Type *tpm;
     
@@ -328,6 +338,9 @@
         tpm->CONTROLS[ch_n].CnV = (uint32_t)((float)(tpm->MOD + 1) * val);
     }
     
+    // Wait for the end of a cycle
+    void waitEndCycle() { getUnit()->waitEndCycle(); }
+    
     // Get my TPM unit object.  This can be used to change the period.
     inline NewPwmUnit *getUnit() { return &NewPwmUnit::unit[tpm_n]; }
     
--- a/TLC5940/TLC5940.h	Sun Mar 19 05:30:53 2017 +0000
+++ b/TLC5940/TLC5940.h	Thu Mar 23 05:19:05 2017 +0000
@@ -588,12 +588,15 @@
         // turn off the grayscale clock
         gsclk.glitchFreeWrite(0);
         
-        // make sure the gsclk cycle actually ends before we proceed - each
-        // cycle is 1/GSCLK_SPEED long, so we need about 3us
-        wait_us(3);
+        // Make sure the gsclk cycle ends, since the TLC5940 data sheet
+        // says we can't take BLANK high until GSCLK has been low for 20ns.
+        // (We don't have to add any padding for the 20ns, since it'll take
+        // at least one CPU cycle of 60ns to return from waitEndCycle().
+        // That routine won't return until GSCLK is low, so it will have
+        // low for at least 60ns by the time we get back from this call.)
+        gsclk.waitEndCycle();
         
-        // and assert BLANK to end the grayscale cycle
-        blank = (enabled ? 1 : 0);  // for the slight delay (20ns) required after GSCLK goes low
+        // assert BLANK to end the grayscale cycle
         blank = 1;
     }
             
--- a/USBProtocol.h	Sun Mar 19 05:30:53 2017 +0000
+++ b/USBProtocol.h	Thu Mar 23 05:19:05 2017 +0000
@@ -61,6 +61,7 @@
 //                         5 -> TV relay is on
 //                         6 -> sending IR signals designated as TV ON signals
 //              0x20 -> IR learning mode in progress
+//              0x40 -> configuration saved successfully (see below)
 //    00     2nd byte of status (reserved)
 //    00     3rd byte of status (reserved)
 //    00     always zero for joystick reports
@@ -83,6 +84,14 @@
 // retracted (pulled back) positions.  A negative value means that the plunger
 // is pushed forward of the park position.
 //
+// Status bit 0x40 is set after a successful configuration update via special
+// command 65 6 (save config to flash).  The device always reboots after this
+// command, so if the host wants to receive a status update verifying the 
+// save, it has to request a non-zero reboot delay in the message to allow
+// us time to send at least one of these status reports after the save.
+// This bit is only sent after a successful save, which means that the flash
+// write succeeded and the written sectors verified as correct.
+//
 // 2. Special reports
 // We subvert the joystick report format in certain cases to report other 
 // types of information, when specifically requested by the host.  This allows
--- a/ccdSensor.h	Sun Mar 19 05:30:53 2017 +0000
+++ b/ccdSensor.h	Thu Mar 23 05:19:05 2017 +0000
@@ -43,21 +43,22 @@
 //      higher noise tolerance tends to result in reduced resolution.
 //
 //  2 = Maximum dL/ds (highest first derivative of luminance change per
-//      distance, or put another way, the steepest rate of change in
-//      brightness).  This scans the whole image and looks for the 
-//      position with the highest dL/ds value.  We average over a window
-//      of several pixels, to smooth out pixel noise; this should avoid
-//      treating a single spiky pixel as having a steep slope adjacent 
-//      to it.  The advantage I see in this approach is that it looks for
-//      the *strongest* edge, which should make it less likely to be fooled
-//      by noise that creates a false edge.  Algorithms 1 and 2 have
-//      basically fixed thresholds for what constitutes an edge, but this
-//      approach is more dynamic in that it evaluates each edge-like region
-//      and picks the one with the highest contrast.  The one fixed feature
-//      of this algorithm is the width of the edge, since that's limited
-//      by the pixel window; but we only deal with one type of image, so
-//      it should be possible to adjust the light source and sensor position
-//      to always yield an image with a narrow enough edge region.
+//      distance, or put another way, the steepest brightness slope).
+//      This scans the whole image and looks for the position with the 
+//      highest dL/ds value.  We average over a window of several pixels, 
+//      to smooth out pixel noise; this should avoid treating a single 
+//      spiky pixel as having a steep slope adjacent to it.  The advantage
+//      in this approach is that it looks for the strongest edge after
+//      considering all edges across the whole image, which should make 
+//      it less likely to be fooled by isolated noise that creates a 
+//      single false edge.  Algorithms 1 and 2 have basically fixed 
+//      thresholds for what constitutes an edge, but this approach is 
+//      more dynamic in that it evaluates each edge-like region and picks 
+//      the best one.  The width of the edge is still fixed, since that's 
+//      determined by the pixel window.  But that should be okay since we 
+//      only deal with one type of image.  It should be possible to adjust 
+//      the light source and sensor position to always yield an image with 
+//      a narrow enough edge region.
 //
 //      The max dL/ds method is the most compute-intensive method, because
 //      of the pixel window averaging.  An assembly language implemementation
@@ -98,11 +99,18 @@
         // Figure the scaling factor for converting native pixel readings
         // to our normalized 0..65535 range.  The effective calculation we
         // need to perform is (reading*65535)/(npix-1).  Division is slow
-        // on the M0+, so recast this as a multiply, in 64K-scaled fixed
-        // point ints.  To apply this, multiply by this inverse value and
-        // shift right 16 bits.
+        // on the M0+, and floating point is dreadfully slow, so recast the
+        // per-reading calculation as a multiply (which, unlike DIV, is fast
+        // on KL25Z - the device has a single-cycle 32-bit hardware multiply).
+        // How do we turn a divide into a multiply?  By calculating the
+        // inverse!  How do we calculate a meaningful inverse of a large
+        // integer using integers?  By doing our calculations in fixed-point
+        // integers, which is to say, using hardware integers but treating
+        // all values as multiplied by a scaling factor.  We'll use 64K as
+        // the scaling factor, since we can divide the scaling factor back
+        // out by using an arithmetic shift (also fast on M0+).  
         native_npix = nativePix;
-        scaling_factor = (65535U*65536U) / nativePix;
+        scaling_factor = (65535U*65536U) / (nativePix - 1);
         
         // set the midpoint history arbitrarily to the absolute halfway point
         memset(midpt, 127, sizeof(midpt));
@@ -140,7 +148,9 @@
             // a fixed-point number with 64K scale, so multiplying the
             // pixel reading by this will give us the result with 64K
             // scale: so shift right 16 bits to get the final answer.
-            r.pos = uint16_t((scaling_factor * uint32_t(pixpos)) >> 16);
+            // (The +32768 is added for rounding: it's equal to 0.5 
+            // at our 64K scale.)
+            r.pos = uint16_t((scaling_factor*uint32_t(pixpos) + 32768) >> 16);
             r.t = tpix;
             
             // success
--- a/main.cpp	Sun Mar 19 05:30:53 2017 +0000
+++ b/main.cpp	Thu Mar 23 05:19:05 2017 +0000
@@ -216,6 +216,7 @@
 #include "math.h"
 #include "diags.h"
 #include "pinscape.h"
+#include "NewMalloc.h"
 #include "USBJoystick.h"
 #include "MMA8451Q.h"
 #include "tsl1410r.h"
@@ -313,108 +314,6 @@
 #endif
 
 
-// --------------------------------------------------------------------------
-//
-// Custom memory allocator.  We use our own version of malloc() for more
-// efficient memory usage, and to provide diagnostics if we run out of heap.
-//
-// We can implement a more efficient malloc than the library can because we
-// can make an assumption that the library can't: allocations are permanent.
-// The normal malloc has to assume that allocations can be freed, so it has
-// to track blocks individually.  For the purposes of this program, though,
-// we don't have to do this because virtually all of our allocations are 
-// de facto permanent.  We only allocate dyanmic memory during setup, and 
-// once we set things up, we never delete anything.  This means that we can 
-// allocate memory in bare blocks without any bookkeeping overhead.
-//
-// In addition, we can make a larger overall pool of memory available in
-// a custom allocator.  The RTL malloc() seems to have a pool of about 3K 
-// to work with, even though there really seems to be at least 8K left after 
-// reserving a reasonable amount of space for the stack.
-
-// halt with a diagnostic display if we run out of memory
-void HaltOutOfMem()
-{
-    printf("\r\nOut Of Memory\r\n");
-    // halt with the diagnostic display (by looping forever)
-    for (;;)
-    {
-        diagLED(1, 0, 0);
-        wait_us(200000);
-        diagLED(1, 0, 1);
-        wait_us(200000);
-    }
-}
-
-// For our custom malloc, we take advantage of the known layout of the
-// mbed library memory management.  The mbed library puts all of the
-// static read/write data at the low end of RAM; this includes the
-// initialized statics and the "ZI" (zero-initialized) statics.  The
-// malloc heap starts just after the last static, growing upwards as
-// memory is allocated.  The stack starts at the top of RAM and grows
-// downwards.  
-//
-// To figure out where the free memory starts, we simply call the system
-// malloc() to make a dummy allocation the first time we're called, and 
-// use the address it returns as the start of our free memory pool.  The
-// first malloc() call presumably returns the lowest byte of the pool in
-// the compiler RTL's way of thinking, and from what we know about the
-// mbed heap layout, we know everything above this point should be free,
-// at least until we reach the lowest address used by the stack.
-//
-// The ultimate size of the stack is of course dynamic and unpredictable.
-// In testing, it appears that we currently need a little over 1K.  To be
-// conservative, we'll reserve 2K for the stack, by taking it out of the
-// space at top of memory we consider fair game for malloc.
-//
-// Note that we could do this a little more low-level-ly if we wanted.
-// The ARM linker provides a pre-defined extern char[] variable named 
-// Image$$RW_IRAM1$$ZI$$Limit, which is always placed just after the
-// last static data variable.  In principle, this tells us the start
-// of the available malloc pool.  However, in testing, it doesn't seem
-// safe to use this as the start of our malloc pool.  I'm not sure why,
-// but probably something in the startup code (either in the C RTL or 
-// the mbed library) is allocating from the pool before we get control. 
-// So we won't use that approach.  Besides, that would tie us even more
-// closely to the ARM compiler.  With our malloc() probe approach, we're
-// at least portable to any compiler that uses the same basic memory
-// layout, with the heap above the statics and the stack at top of 
-// memory; this isn't universal, but it's very typical.
-
-static char *xmalloc_nxt = 0;
-size_t xmalloc_rem = 0;
-void *xmalloc(size_t siz)
-{
-    if (xmalloc_nxt == 0)
-    {
-        xmalloc_nxt = (char *)malloc(4);
-        xmalloc_rem = 0x20003000UL - 2*1024 - uint32_t(xmalloc_nxt);
-    }
-    
-    siz = (siz + 3) & ~3;
-    if (siz > xmalloc_rem)
-        HaltOutOfMem();
-        
-    char *ret = xmalloc_nxt;
-    xmalloc_nxt += siz;
-    xmalloc_rem -= siz;
-    
-    return ret;
-}
-
-// Overload operator new to call our custom malloc.  This ensures that
-// all 'new' allocations throughout the program (including library code)
-// go through our private allocator.
-void *operator new(size_t siz) { return xmalloc(siz); }
-void *operator new[](size_t siz) { return xmalloc(siz); }
-
-// Since we don't do bookkeeping to track released memory, 'delete' does
-// nothing.  In actual testing, this routine appears to never be called.
-// If it *is* ever called, it will simply leave the block in place, which
-// will make it unavailable for re-use but will otherwise be harmless.
-void operator delete(void *ptr) { }
-
-
 // ---------------------------------------------------------------------------
 //
 // Forward declarations
@@ -2397,8 +2296,10 @@
     // current LOGICAL on/off state as reported to the host.
     uint8_t logState : 1;
 
-    // previous logical on/off state, when keys were last processed for USB 
-    // reports and local effects
+    // Previous logical on/off state, when keys were last processed for USB 
+    // reports and local effects.  This lets us detect edges (transitions)
+    // in the logical state, for effects that are triggered when the state
+    // changes rather than merely by the button being on or off.
     uint8_t prevLogState : 1;
     
     // Pulse state
@@ -2414,7 +2315,7 @@
     // door is open and off when the door is closed (or vice versa, but in either 
     // case, the switch state corresponds to the current state of the door at any
     // given time, rather than pulsing on state changes).  The "pulse mode"
-    // option brdiges this gap by generating a toggle key event each time
+    // option bridges this gap by generating a toggle key event each time
     // there's a change to the physical switch's state.
     //
     // Pulse state:
@@ -2465,13 +2366,16 @@
     uint8_t data;       // key state byte for USB reports
 } mediaState = { false, 0 };
 
-// button scan interrupt ticker
-Ticker buttonTicker;
+// button scan interrupt timer
+Timeout scanButtonsTimeout;
 
 // Button scan interrupt handler.  We call this periodically via
 // a timer interrupt to scan the physical button states.  
 void scanButtons()
 {
+    // schedule the next interrupt
+    scanButtonsTimeout.attach_us(&scanButtons, 1000);
+    
     // scan all button input pins
     ButtonState *bs = buttonState, *last = bs + nButtons;
     for ( ; bs < last ; ++bs)
@@ -2591,7 +2495,7 @@
     }
     
     // start the button scan thread
-    buttonTicker.attach_us(scanButtons, 1000);
+    scanButtonsTimeout.attach_us(scanButtons, 1000);
 
     // start the button state transition timer
     buttonTimer.start();
@@ -3866,6 +3770,9 @@
 const uint8_t TV_RELAY_POWERON = 0x01;
 const uint8_t TV_RELAY_USB     = 0x02;
 
+// pulse timer for manual TV relay pulses
+Timer tvRelayManualTimer;
+
 // TV ON IR command state.  When the main PSU2 power state reaches
 // the IR phase, we use this sub-state counter to send the TV ON
 // IR signals.  We initialize to state 0 when the main state counter
@@ -3906,6 +3813,17 @@
 uint32_t tv_delay_time_us;
 void powerStatusUpdate(Config &cfg)
 {
+    // If the manual relay pulse timer is past the pulse time, end the
+    // manual pulse.  The timer only runs when a pulse is active, so
+    // it'll never read as past the time limit if a pulse isn't on.
+    if (tvRelayManualTimer.read_us() > 250000)
+    {
+        // turn off the relay and disable the timer
+        tvRelayUpdate(TV_RELAY_USB, false);
+        tvRelayManualTimer.stop();
+        tvRelayManualTimer.reset();
+    }
+
     // Only update every 1/4 second or so.  Note that if the PSU2
     // circuit isn't configured, the initialization routine won't 
     // start the timer, so it'll always read zero and we'll always 
@@ -4110,15 +4028,6 @@
     }
 }
 
-// TV relay manual control timer.  This lets us pulse the TV relay
-// under manual control, separately from the TV ON timer.
-Ticker tv_manualTicker;
-void TVManualInt()
-{
-    tv_manualTicker.detach();
-    tvRelayUpdate(TV_RELAY_USB, false);
-}
-
 // Operate the TV ON relay.  This allows manual control of the relay
 // from the PC.  See protocol message 65 submessage 11.
 //
@@ -4145,9 +4054,10 @@
         break;
         
     case 2:
-        // Pulse the relay.  Turn it on, then set our timer for 250ms.
+        // Turn the relay on and reset the manual TV pulse timer
         tvRelayUpdate(TV_RELAY_USB, true);
-        tv_manualTicker.attach(&TVManualInt, 0.25);
+        tvRelayManualTimer.reset();
+        tvRelayManualTimer.start();
         break;
     }
 }
@@ -4179,27 +4089,29 @@
 // delay time in seconds before rebooting.
 uint8_t saveConfigRebootTime;
 
+// status flag for successful config save - set to 0x40 on success
+uint8_t saveConfigSucceededFlag;
+
 // For convenience, a macro for the Config part of the NVM structure
 #define cfg (nvm.d.c)
 
 // flash memory controller interface
 FreescaleIAP iap;
 
-// NVM structure in memory.  This has to be aliend on a sector boundary,
-// since we have to be able to erase its page(s) in order to write it.
-// Further, we have to ensure that nothing else occupies any space within
-// the same pages, since we'll erase that entire space whenever we write.
-static const union
+// figure the flash address for the config data
+const NVM *configFlashAddr()
 {
-    NVM nvm;      // the NVM structure
-    char guard[((sizeof(NVM) + SECTOR_SIZE - 1)/SECTOR_SIZE)*SECTOR_SIZE];
-}
-flash_nvm_memory __attribute__ ((aligned(SECTOR_SIZE))) = { };
-
-// figure the flash address as a pointer
-NVM *configFlashAddr()
-{
-    return (NVM *)&flash_nvm_memory;
+    // figure the number of sectors we need, rounding up
+    int nSectors = (sizeof(NVM) + SECTOR_SIZE - 1)/SECTOR_SIZE;
+    
+    // figure the total size required from the number of sectors
+    int reservedSize = nSectors * SECTOR_SIZE;
+    
+    // locate it at the top of memory
+    uint32_t addr = iap.flashSize() - reservedSize;
+    
+    // return it as a read-only NVM pointer
+    return (const NVM *)addr;
 }
 
 // Load the config from flash.  Returns true if a valid non-default
@@ -4226,7 +4138,7 @@
     // the free space, it won't collide with the linker area.
     
     // Figure how many sectors we need for our structure
-    NVM *flash = configFlashAddr();
+    const NVM *flash = configFlashAddr();
     
     // if the flash is valid, load it; otherwise initialize to defaults
     bool nvm_valid = flash->valid();
@@ -4245,55 +4157,17 @@
     return nvm_valid;
 }
 
-void saveConfigToFlash()
+// save the config - returns true on success, false on failure
+bool saveConfigToFlash()
 {
     // make sure the plunger sensor isn't busy
     waitPlungerIdle();
     
     // get the config block location in the flash memory
     uint32_t addr = uint32_t(configFlashAddr());
-    
-    // loop until we save it successfully
-    for (int i = 0 ; i < 5 ; ++i)
-    {
-        // show cyan while writing
-        diagLED(0, 1, 1);
-        
-        // save the data
-        nvm.save(iap, addr);
-    
-        // diagnostic lights off
-        diagLED(0, 0, 0);
-        
-        // verify the data
-        if (nvm.verify(addr))
-        {
-            // show a diagnostic success flash (rapid green)
-            for (int j = 0 ; j < 4 ; ++j)
-            {
-                diagLED(0, 1, 0);
-                wait_us(50000);
-                diagLED(0, 0, 0);
-                wait_us(50000);
-            }
-            
-            // success - no need to write again
-            break;
-        }
-        else
-        {            
-            // Write failed.  For diagnostic purposes, flash red a few times.
-            // Then go back through the loop to make another attempt at the
-            // write.
-            for (int j = 0 ; j < 5 ; ++j)
-            {
-                diagLED(1, 0, 0);
-                wait_us(50000);
-                diagLED(0, 0, 0);
-                wait_us(50000);
-            }
-        }
-    }
+
+    // save the data    
+    return nvm.save(iap, addr);
 }
 
 // ---------------------------------------------------------------------------
@@ -5709,7 +5583,7 @@
                 nvm.valid(),        // a config is loaded if the config memory block is valid
                 true,               // we support sbx/pbx extensions
                 true,               // we support the new accelerometer settings
-                xmalloc_rem);       // remaining memory size
+                mallocBytesFree()); // remaining memory size
             break;
             
         case 5:
@@ -5962,11 +5836,6 @@
     // say hello to the debug console, in case it's connected
     printf("\r\nPinscape Controller starting\r\n");
     
-    
-    // debugging: print memory config info
-    //    -> no longer very useful, since we use our own custom malloc/new allocator (see xmalloc() above)
-    // {int *a = new int; printf("Stack=%lx, heap=%lx, free=%ld\r\n", (long)&a, (long)a, (long)&a - (long)a);} 
-    
     // clear the I2C connection
     clear_i2c();
 
@@ -6383,7 +6252,8 @@
         uint16_t statusFlags = 
             cfg.plunger.enabled             // 0x01
             | nightMode                     // 0x02
-            | ((psu2_state & 0x07) << 2);   // 0x04 0x08 0x10
+            | ((psu2_state & 0x07) << 2)    // 0x04 0x08 0x10
+            | saveConfigSucceededFlag;      // 0x40
         if (IRLearningMode != 0)
             statusFlags |= 0x20;
 
@@ -6513,7 +6383,8 @@
         if (saveConfigPending != 0)
         {
             // save the configuration
-            saveConfigToFlash();
+            if (saveConfigToFlash())
+                saveConfigSucceededFlag = 0x40;
             
             // if desired, reboot after the specified delay
             if (saveConfigPending == SAVE_CONFIG_AND_REBOOT)
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/mbed_Fixes/KLXX_us_ticker_fix.c	Thu Mar 23 05:19:05 2017 +0000
@@ -0,0 +1,31 @@
+#include <stddef.h>
+#include "us_ticker_api.h"
+
+// Bug fix: if scheduling an event in the past, schedule it for
+// the very near future rather than invoking the handler directly.
+// For Tickers and other recurring events, invoking the handler
+// can cause significant recursion, since the handler might try
+// to schedule the next event, which will end up back here, which
+// will call the handler again, and so on.  Forcing the event
+// into the future prevents this recursion and ensures bounded
+// stack use.  The effect will be the same either way: the handler
+// will be called late, since we can't actually travel back in time
+// and call it in the past.  But this way we don't blow the stack
+// if we have a high-frequency recurring event that has gotten
+// significantly behind (because of a long period with interrupts
+// disabled, say).
+extern void $Super$$us_ticker_set_interrupt(timestamp_t);
+void $Sub$$us_ticker_set_interrupt(timestamp_t timestamp) 
+{
+    // If the event was in the past, schedule it for almost (but not
+    // quite) immediately.  This prevents the base version from recursing
+    // into the handler; instead, we'll schedule an interrupt as for any
+    // other future event.
+    int tcur = us_ticker_read();
+    int delta = (int)((uint32_t)timestamp - tcur);
+    if (delta <= 0)
+        timestamp = tcur + 2;
+        
+    // call the base handler
+    $Super$$us_ticker_set_interrupt(timestamp);
+}
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/mbed_Fixes/ticker_api_fix.c	Thu Mar 23 05:19:05 2017 +0000
@@ -0,0 +1,46 @@
+#include <stddef.h>
+#include "ticker_api.h"
+
+void $Sub$$ticker_insert_event(const ticker_data_t *const data, ticker_event_t *obj, timestamp_t timestamp, uint32_t id) {
+    /* disable interrupts for the duration of the function */
+    __disable_irq();
+
+    // initialise our data
+    obj->timestamp = timestamp;
+    obj->id = id;
+
+    /* Go through the list until we either reach the end, or find
+       an element this should come before (which is possibly the
+       head). */
+    ticker_event_t *prev = NULL, *p = data->queue->head;
+    while (p != NULL) {
+        /* check if we come before p */
+        if ((int)(timestamp - p->timestamp) < 0) {
+            break;
+        }
+        /* go to the next element */
+        prev = p;
+        p = p->next;
+    }
+
+    /* if we're at the end p will be NULL, which is correct */
+    // BUG FIX: do this BEFORE calling set_interrupt(), to ensure
+    // that the list is in a consistent state if set_interrupt()
+    // happens to call the event handler, and the event handler
+    // happens to call back here to re-queue the event.  Such
+    // things aren't hypothetical: this exact thing will happen
+    // if a Ticker object gets more than one cycle behind.  The
+    // inconsistent state of the list caused crashes.
+    obj->next = p;
+
+    /* if prev is NULL we're at the head */
+    if (prev == NULL) {
+        data->queue->head = obj;
+        data->interface->set_interrupt(timestamp);
+    } else {
+        prev->next = obj;
+    }
+
+    __enable_irq();
+}
+
--- a/nvm.h	Sun Mar 19 05:30:53 2017 +0000
+++ b/nvm.h	Thu Mar 23 05:19:05 2017 +0000
@@ -42,8 +42,9 @@
                 && checksum == CRC32(&d, sizeof(d)));
     }
     
-    // save to non-volatile memory
-    void save(FreescaleIAP &iap, int addr)
+    // Save to non-volatile memory.  Returns true on success, false
+    // if an error code is returned from the flash programmer.
+    bool save(FreescaleIAP &iap, int addr)
     {
         // update the checksum and structure size
         d.sig = SIGNATURE;
@@ -52,13 +53,7 @@
         checksum = CRC32(&d, sizeof(d));
         
         // save the data to flash
-        iap.program_flash(addr, this, sizeof(*this));
-    }
-    
-    // verify that the NVM matches the in-memory configuration
-    bool verify(int addr)
-    {
-        return memcmp((NVM *)addr, this, sizeof(*this)) == 0;
+        return iap.programFlash(addr, this, sizeof(*this)) == FreescaleIAP::Success;
     }
     
     // stored data (excluding the checksum)