My checklist for debugging insurmountable issues

June 21, 2014

I listen to the Amp Hour podcast. Last week, they were talking about ways engineers fail and about checklists as a way to be more disciplined in avoiding preventable errors.

I try to learn from my mistakes so I do have a checklist of sorts when I reach the “it’s all broken and never, ever going to work again” stage of debugging. Of course, when I get to that point, the symptoms are usually different. Still, some parts of the path are relatively common.

1. Having you tried turning it off and back on again?

It is a joke, a very well seasoned joke. But it is only funny because it is so horribly true. This isn’t a good way to debug a horrible problem but it does provide the circumstances the problem happens under. I often start from scratch to reproduce issues because there is often a clue in the process. So, turning it off and back on again is a way to make sure that I start from a well-understood starting point.

The next questions:

Really? Are you sure?

But these are going to follow most of these checklist questions. As embedded system get more complicated, it is pretty easy to turn a part of the system off but not all of it. I don’t usually go from no-power-to-my-desk, I almost always start with my computer on. But sometimes, turning off all power is necessary (including restarting the debugger).

2. Does it have power?

This is different than the previous in that the “it” refers to all the things that may be broken: the processor, the debugger, the sensor or actuator, the level shifters, everything. This step usually requires a voltmeter (and is related to #5).

3. Is it running the code you think it is?

I sometimes have two or three code bases. If my development environment is pointed to the wrong one, I could easily be compiling over here but loading the image from over there. Or possibly, if the load process is difficult, I may think I’ve loaded the code and somehow missed a step.

I’m working on a big system now, a monitor of another system. When I compile the code, it goes through four machines before it gets to where I can try it out. The possibility that I typed cp instead of scp (or forgot the ending colon!), well, let’s just say it happens more often than I’d like.

This is the reason to have build numbers. But if you are really, really sure it is running the code you think, change something about the output and reload. Make sure you see the change.

4. Did you read the boot and debug output?

I write error logging features and always want a serial debug output. I love power on self tests. However, once the system is working, I stop looking at them. But sometimes, if a cable has loosened or hardware has failed, the system will tell me. But only if I am listening.

5. If it used to work, what changed?

Nothing changed, of course. It just stopped working. My minor code change could not possibly have caused something so catastrophic.

I’ve heard that. I’ve said it. That doesn’t make it true.

If nothing changed, then run the old code. If it fails, well, that’s interesting now, isn’t it? If it succeeds, well then, stop saying nothing changed, something obviously did.

This does require frequent commit to version control, to get back to a last-known-good image. But you were doing that anyway, right? And then you can binary search to determine where the error crept in.

Note that if the code change doesn’t explain it, compare the map files (even the binaries). Realizing that something crossed a boundary may give you inside. Oh, also, the makefile (or project file) may have changed: optimizations can have big ramifications.

6. Is there anything interesting in the map file?

Like many embedded engineers, I find the map file to be a bit of an illegible mess. Happily, I’m not afraid of them anymore. There is an amazing amount of information in the map file. And it provides a different perspective on the code, sometimes it will jog my memory.

7. Can you prove it is hardware?

Of course, at this point, it probably is. But you know hardware engineers, they can be feeble. They need proof. So what kind of proof can you offer? How can you break it down to show it cannot possibly be the software?

Seriously, I have had the privilege of working with some phenomenal hardware engineers. It is seldom hardware (but not never). The process of proving it is hardware is a good part of debugging. Plus, if you can make the difficult error repeatable for the hardware engineer, they’ll probably take you to lunch for making their jobs easier.

8. Can you explain the problem to another engineer?

When I tutored intro to CS, I asked people to explain their problem to a teddy bear outside my office before explaining them to me. I usually listened in. However, at least 50% of the people thanked the bear and left without talking to me. Ok, probably only 20% thanked the bear but most walked away because they never actually needed my help. They needed to get their thoughts in order, to explain it to themselves.

I do that now, talk to myself. Sometimes I try to explain it to a trusted colleague (or a junior engineer) in email, trying to figure out what questions they would ask me so the explanations is really good.  Occasionally, I even send the email after all that, if I still can’t figure out the problem.

9. Did you use the single line if again?

When I saw Apple’s goto Fail bug, I completely understood.  I avoid unbraced if statements because one month, I tallied up my most common coding mistakes and found that unbraced if statements caused a disproportionate number of my bugs. I vowed never to use them again.  Since this is a known failure point on my part, it makes my checklist.



Scribbles to myself

May 28, 2014

I did Unix system administration in college. That was many, many years ago. And really, I managed the consultants, wrote quick references for users, made new accounts, and only filled in on deep technical management when someone made me (usually when someone else was sick or had lit something on fire). Good times. But I really was an expert unix user at one time for multiple unix varieties (hey, the math cluster was hpux so I got deep into that the summer I spent working on computational math libraries to model fluid flow).

But, as I mentioned, it was many, many years ago. Since then, I’ve played with Linux, dabbled here and there. I’m more comfortable with Mac OS in the command line (yes, I’m that awful person who remapped my flower and control keys so my fingers didn’t need to re-learn ctrl-z when developing with xcode).

My next contract will be all Linux-y and I’ve been wanting to do more embedded Linux (why is everyone so excited? When I played with it in 2006 it seemed like a great way to spend $100k in development time and then switch to something deterministic). Having borrowed a Beagle Bone Black, reinstalled Windows to have 64-bits so I can access all of my RAM, and installed a virtual machine so my husband will stop laughing at me when I destroy things, I’m ready.

My first mini-project is to rebuild the BBB’s Angstrom distribution. The board I have is a bit old and the OS has been updated. It would be nice to return it from whence I borrowed it, all updated.

Of course, it isn’t that easy, I’m plagued with stupid things I feel like I should know. And I’m reading Chris Hallinan’s Embedded Linux Primer: A Practical Real-World Approach (2nd Edition). I read the first edition many years ago (about the time it came out since we were still working on the embedded Linux project then, though my role was manager-only, not developer).

As I struggle with getting everything set up and configured as I like, I figured I should note some of my favorite commands.

On my Linux VM, here are some of the things I shouldn’t forget:

> cat /proc/version 
Linux version 3.8.13-16.2.2.el6uek.x86_64 (mockbuild@ca-build44.us.oracle.com) (gcc version 
4.4.7 20120313 (Red Hat 4.4.7-3) (GCC) ) #1 SMP Tue Nov 26 08:41:44 PST 2013

I like to know where I am and what I’m running.

> sudo usermod -a -G dialout elecia

Since I got an off-the-shelf VM (Oracle’s Linux 6), I wanted an account for myself. I know better than to run as root all the time. On the other hand, I keep failing to be able to use things. It took me a stupidly long time to remember that I have to log out after changing permissions.

> screen /dev/ttyUSB0 115200

As long as I remember to switch my USB-serial cable to the VM, I can snoop on the BBB as it boots up and use a command line there. U-boot is neat. Also, control-a then k kills the screen but leaves it openable (it detaches).

On my unmodified BeagleBone Black

root@beaglebone:~# cat /proc/version
Linux version 3.8.6 (koen@rrMBP) (gcc version 4.7.3 20130205 (prerelease) (Linaro GCC 4.7-2013.02-01)
 ) #1 SMP Sat Apr 13 09:10:52 CEST 2013

I think I’m going to need to learn more about Linaro. I has come up a few times. I don’t think it is like ucLinux which can run without an MMM. Still, Linaro seems common for microcontrollers (the system-on-a-chip (SOCs)) that I’m likely to want to use.

This one is general more general:

ln -s /usr/local/bin/python2.7 python

That is linking the source (/usr/local/bin/python2.7) to the dest (python in the current directory). I’ve missed symbolic links.

Alias is ok:

alias ll="ls -lags"

But it should be noted that friends do not do

alias vi="rm -rf"

When their terminal is left open in a public environment. That’s just wrong. Of course, the way I use vi, it might as well be right.

I’ve been trying to stay in the Linux environment for most of the stuff I’m doing. When I find myself typing in questions into my Windows browser, I stop and go back to Linux. The fact that the VM captures my keyboard so I can’t alt-tab out of there is probably a good thing.

Building Angstrom has been difficult, lots of dependencies that are required to be something else by another part of my OS. Since I got the off-the-shelf VM from Oracle, I don’t think I got what would have been easiest to build Angstrom. I’m not sure what they want but it isn’t the Linux I have.

Oh, another good command to remember:

 find . -name sanity.conf

Where the dot is where to search (from here through all directories) and -name is what to search for. I initially typed “find sanity.conf” which leads to a pleasing error message

find: `sanity.conf': No such file or directory

But that was the clue to me that find was one of those trickier commands that I usually mapped.

I was looking for sanity.conf because I got error messages that suggested I look in there for more information. Inside, there was a comment:

# Expert users can confirm their sanity with "touch conf/sanity.conf"

Windows doesn’t have this whimsy. I’d forgotten that. Linux is more clearly built by people, with all of the interpersonal spats associated therein.

Windows is more of a machine, with the personality of an annoying, talking paperclip. Linux is more of a crowded bar with lots of people talking in many groups. If you can find a quiet spot and a good group, you’ve won.

But standing on the outside, trying to figure out where to go, is excruciating. And wearing the wrong distribution can get you shunned.




Are you ok is up!

May 23, 2014

The build instructions for my are-you-ok widget is up on SparkFun! How neat!

This isn’t the end for that project. I’ve been working on getting email on Maxwell (surprisingly not difficult). I’m going to visit Hugh this weekend to see why his accelerometer is fussy.

In a couple weeks, Elizabeth will be on the podcast again to talk about what needs to happen next, if she’s happy with the system and what changes to make. There is a rumor that SparkFun will have a kit of parts for me to give away at that time to podcast listeners. (I need a contest! Guess a number? The quotes are too easy thanks to google.)

For someone who seems to be always starting a contract next week (sigh), I have been busy. My EELive talk is going up on element14. I’m staying about a week ahead though this week I have to do the summary and I’m not ready.  Also, on element14, Sophi Kravitz asked me questions about consulting but I kept distracting her with RTOSs and stories of are-you-ok widgets. I’m happy with her resulting interview.

I went to the SOLID conference. it was interesting and eclectic. O’Reilly gave a big stack of my books away while I signed them and stress-chatted with people. I was there as press, recording things for the podcast. But Christopher says the noise level is too high, the results are too difficult to listen to. Argh. I’m not sure what I’m going to do. I do need to do something to “pay” for the press pass (and all the people I talked to).

I am still working with the Beagle Bone Black, though slowly. I updated my MacBook Pro from Win7 32-bit to Win7 64-bit which lets me use more than 2G of RAM. That is a complete re-install so I’m still finding things I forgot to back up (my bookmarks!). One reason to do this was to run virtual machines so now I have Linux running too. (It is the Oracle 6 one which seems to be Fedora based, I’m still orienting on how things work.) I’m trying to build Angstrom, just to update the OS that is on the board (step 1: update with known good image, step 2: update with my built image that should be the same as the known good, step 3: break everything).

I’m currently installing Python 2.7 because some precursor to actually building Angstrom needs it. It just gave me an error in the build process of python because it doesn’t have some library it needs. This is exactly how I remember Linux being.

I hope you have a good, relaxing weekend. I plan to.


Beagle Bone Black

May 17, 2014

I have a beagle. She’s a great dog.

No, that’s not right. She’s a terrible dog.

When you look up breed information about beagles, you see “merrily stubborn” and “amiable and determined.” What that means is “thinks you are an idiot but is pleased with the opportunity to laugh at you.”

My dog thinks I’m dumb for not wanting to roll in whatever it is she just rolled in. In her world, I’m her not-so-bright straight-man, trying to make her go in boring directions instead of following her supernose. But she’s a happy-go-lucky dog, having accepted the burden of trying to teach me about the joy of squirrels.

(Seriously, she’s an awesome dog, far too intelligent, and very seldom as sad as her pictures indicate.)

And so we plug the beagle into the USB port…

Every time I think about the Beagle boards, I fall into rumination about my pet. Let’s just say, this board had best not act like my dog.

But let’s see what it does act like… Philip over at Fliptronic loaned me his Beagle Bone Black for a week or two after my twitter-whining about Sparkfun’s lack of stock got overwhelming and I finally just asked if anyone had one I could borrow. Yay Philip!

I spent some time on beagleboard.org, reading about the system. It looks sort of like an Arduino or an mbed or any number of other small processor development boards. I keep forgetting that it (and Raspberry Pi) are computers, not really embedded platforms. Certainly, it has more oomph than the computers I had in the 90s.

Now that I have one in my possession, what am I going to do with it? I don’t have an end goal but I do have a couple of things to try, mostly following along with some Adafruit Beagle Bone tutorials.

Step 1 is unbox it, then plug in the BBB to USB.  That was relatively unclimactic until I plugged the hub into my computer. Then it started signalling planes with its ridiculously bright blue LEDs. Next step, install drivers.  Clearly my beagle told them about my mental deficiencies because they’ve really made the getting started page simple.

Though, of course, it didn’t work. (I swear, computers hate me.) Everything installed ok, didn’t say I needed to reset my computer. The getting started page say to use Chrome to navigate to, which will be a network-over-USB thing. That doesn’t work. The page says older software images require ejecting the BBB as a USB drive but that gives me an error (as in “An error occurred whiled ejecting ‘Removable Disk (G:)”, thanks Microsoft. Unplugging USB doesn’t  work but it does get the flashing lights to stop.

Oh gods, the flashing lights. They make me really, really anxious. I put a sparkfun box over them but it still leaks. I thought I could deal with it but making it stop was such a huge relief.

I think the next step is to find a bigger, less light leaking box. Oh, and reboot my computer to maybe activate those drivers I installed (despite the huge warnings Microsoft put up).

 After Windows reboot

The reboot didn’t work but using the USB cable from the box did. The BBB is serving up a webpage which has little scripts I can edit and run (from the webpage). There is an IDE that runs (Cloud9 IDE) though I have to sign up (I hate signing up for things, particularly for things I don’t know if I’ll want to play with).  There is an SSH shell that doesn’t work (“This webpage has a redirect loop”) until I set the date (there is a button on the page that will do that).

Lots to do. Lots of hardware pins.

I’m torn between an dimming an LED and reading about I2C RTC so I can use the information to talk to an accelerometer or something. (I suppose I have an I2C RTC around here somewhere but I know where the accels and fuel gauges are.) I also want to update the FW build, maybe cross compile it myself so I understand all of the pieces. Oh, and I could try out Willow Garage’s robot operating system for Angstrom (the Linux variant that the BBB runs).

Oh, I2C was pretty easy. I don’t even need to know the address? What sort of black magic is this?

There is so much here. I’ll be lost for awhile.  That’s ok.


Not every project works out

May 15, 2014

I don’t think we talk about failure enough. Sure, everyone says “you have to fail, it is the best way to learn” but not one likes to talk about it.

I thought about sharing a big failure story but I think, in the end, I’m not going to because it is also sad.

Still, I can’t just close this and tuck it into my drafts folder (that is a scary and slightly hilarious place) so I’ll tell you about this thing I thought “I’ll just hack that together while I’ve got free time”.

I made those motor boards (and I’m not ready to launch myself in to working on my posture shirt (though I probably should)). And I’m waiting for more hardware to make another ayok widget. But I do have the little dog stuffed animal, now with an RGB LED.

What if it could snore? Or sort of breath? Or have a heartbeat? Not for the ayok feature, just because it can be really comforting to be near something that is alive-ish and has no expectations. (I have real dogs, they have expectations. And my beagle thinks I’m stupid so that really helps my ego.)

Anyway, I got the idea, have the hardware, and thought it would make a cute little demo.  I also have this thought of trying to use the coin cells to con my hand into believing it touched something (briefly).

I went for the Arduino so I could post it on github, write about it a bit. I had a slight plan to show the result to someone, but not a solid thing.

I coded up a little command interpreter for Arduino’s serial port. Then I played with PWM, discovering that the timer configuration is annoyingly nontrivial. I was surprised as Electric Imp and mbed both hide the guts of PWM. I read timer sections of the ATmega328 datasheet, remembering how much I prefer other companies’ method of information organization.

I made commands for changing the duty cycle and the frequency. I make a little script parser so I could have a snore be inhale, snork, pause, exhale, pause, changing the duty cycle and frequency for each stage.

But it didn’t work that well. The snore wasn’t all that consistent and the code freaked out sometimes. I fixed the freak-out by modifying my PWM code to use the overflow interrupt to reload the registers. But it still wasn’t good.

And then my computer’s screen started acting really strangely, strangely in time with the PWM going on and off. I had the Arduino and motor board powered from my USB port.  The motor only takes ~0.5A and the Arduino isn’t a big load. Maybe it was just how dirty the power got. Or maybe it was completely unrelated.

That was the last straw. My results weren’t good. The processor was more annoying to use than I expected. My end goal is fuzzy (and not in a cute-fuzzy sort of way).

Motivating myself to do these projects requires me to like the project. Sure, there were times when the ayok widget was less than fun, debugging can be a grind. But this was just a mess all over. The most fun I had was with the command line parser.

I could persist through this, maybe make something I don’t hate. Probably switch to an mbed to drive it, use a USB hub or external power, maybe get a selection of small motors. But I don’t really want to.

The failure here is not in stopping. The failure is that I’m not learning anything.

I know that with the right motor and PWM tuning I can get snoring working (I’ve done it before). I’m confident heartbeat isn’t tough. And I’m pretty sure I won’t be able to make a determination on my haptic hand-touching-wall thing because it won’t work and I won’t know whether to blame the mounting or the code or the unreality of the situation.

So, it is a little fail and, accordingly, a little depressing.

And I’m not sure what I’m going to do next. I’m hoping to get a contract soon (maybe today, we’ll see). But I’m also borrowing a BeagleBone Black because I’ve been wanting to try it out. I don’t know what my plan is with that. I’ve also been pondering putting ucLinux on a Cortex-M3 devkit (NXP or ST? Or something else?). It is always nice to see how OSs go on processors. I could compare it to FreeRTOS or some of the other small OSs. Build up a library of what’s good for what. But that sounds soooo boring.

Without a goal, I am far more likely to get discouraged.