Monday, October 5, 2015

An adventure in IoT

Dear diary,

It has been a long time since I last wrote to you ....

Uhm, yeah, whatever....

Lately, I have been too busy at work. Actually, I still am.
Our game went into closed beta and hell broke lose, as usual.
I am not complaining, I like what I do, it just leaves me with no energy to do any of my hobbies when I get home. My favorite joke lately is:

"After a busy day at work, the last thing I want to look at when I get home is code.
I am so happy I am not a gynecologist"

So I had nothing to brag about ... except about work, which I can't do here.

Fortunately ?!?, I got sick these days, which forced me to stay at home for two afternoons and then came the weekend. As much a being sick sucks, it leaves you with plenty of spare time and you better spend it doing something interesting, which keeps you busy, so you don't just lie down and think how you feel like shit all day. If you can.
And, in the mean time, I got an idea for a simple DIY project. Basically, I've always wanted to be able to work from home with my morning coffee (which takes about an hour). Technically, it is not a problem, there are more than enough solutions for remote access, some of them even can handle launching a 3D game and are secure enough. I am using one and it works quite alright. But, there is this minor issue - most computers at the office are automatically shut down every night, because there have been too many cases of power shortages, which led to hardware being blown. Let's not comment how absurd that is in the 21st century, in an area called "Business Park" with 1000s of computers spread in 100s of offices. Anyway, my problem is simple - I cannot remote access my PC while it is shut down.
There are many solutions, like wake up on LAN or simply arrange it so my PC is not shut down (and take the risk that something might blow during the next power spike).
But, I thought of something cooler - a device that would sit on top of my PC's case and physically press its power button when commanded over the internet!
Sounds simple, right? Basically all you need is a little servo and a way to control it over the internet.'
And I have a spare servo lying around - I ordered 3 servos some time ago, 2 of them are used for my birthday present for my cats and the 3rd one was just waiting ... for this moment :)
I also have some ESP8266 boards waiting to be used. If you don't know what that is - it is a little and very cheap device that has WiFi capabilities and can be programmed. It is becoming more and more popular all over the internet. I have written several posts about it: here, here and here.

Oh wait, I forgot my typical disclaimer - this post is going to be full of technical details. It will contain a good amount of ranting, this time towards a group of open source developers you never heard of, developing a product called NodeMCU. If you're not interested in technical details, here is the usual summary: I'm ok, I haven't been blogging lately, because I was (and still am) too busy at work. I finally had some time to tinker with another gadget. I haven't even finished it yet, I don't know if I ever will. But, as they say. what's important is the journey, not the destination (good luck selling that to a publisher, by the way). So, I advise you to continue reading only if you're interested in technical details.

Now, for the (3?) people who made it to here, lets continue :)

At the moment, there are 4 ways to "program" the ESP8266:

1) Using it as a serial-to-WiFi device, by sending modem-like AT commands to its serial port. This requires either plugging it into a (running) PC or using another device, like an arduino. Could work (with an arduino), but I am not a fan of this approach, simply because the ESP itself is programmable and does not need an arduino.

2) Using a leaked SDK - I think no one does that anymore - it is too cryptic and cumbersome. No, thanks.

3) Using esp8266 / Arduino - this lets you program the ESP in the arduino environment in C++, very similar to how you program the arduino itself. This is what I have been using until now to mess around with the ESP. I still think this is the best approach.

4) Using NodeMCU - a firmware which allows you to program the ESP in Lua. I have always been skeptical about that - interpreted language on such a "tight" platform?!? But it is getting more and more popular, and I was feeling adventurous, so I decided to give it a try.

So, I chose NodeMCU this time. The task (just control a servo upon a request from the internet) seemed simple enough. Plus, there is this interesting concept of being able to push software updates over the air - since it is interpreted, you could (theoretically)  send the new code / scripts over the internet and the host code would rewrite its files and reboot. Sounds cool, right? Except that in practice it is nearly impossible. I will probably explain why later.
Also, using method 3 requires you to reflash the ESP every time you upload your program. You need to reset the ESP into programming mode, upload the new firmware (your program) then test. It's easy, but takes much more time than I would like. With Lua, you just upload the new script and that's it (or at least it seemed so in the beginning).

The first thing I tried was to flash an ESP with the NodeMCU firmware, hook up a servo and give it a spin. Worked like a charm, super easy, done in like 15 min. So far so good.

Then I stumbled into the old problem that the ESP does crazy stuff on pins 0 and 2 on boot. I complained about it in previous posts, but basically when the ESP doesn't like it when stuff is connected to its GPIO0 and GPIO2 pins on startup. In the case of a servo attached, the results were frightening - if you boot it with the servo attached the servo begins to produce horrible noises and not move at all. Luckily, it didn't blow up, because I have just 1 spare at the moment. I have dealt with this problem before, but the solution involved too many external components that it just wasn't worth it.
For this project, I just use an ESP-201 - a different variant, which has more GPIO pins broken out. So I leave 0 and 2 alone and use another one (GPIO4 in this case).

Oh, and while I am on the subject of ESP-201 - whoever produced it like this is a fucking idiot. If you don't look carefully at the pictures, it looks like a nice breadboard-friendly board with two rows of 2.54mm spaced male headers and a serial header. On a second look (or in my case when you order 3 and they arrive) you find out that:
- The serial header pins are pointing in the same direction as the other pins. So you can't put it on a breadboard after all. Not the way it is. Some people say they just bent the serial headers so they stay horizontal and then it can be placed at the edge of a breadboard,
- The labels on the pins are on the BOTTOM side! Are you fucking kidding me? How am I supposed to read those labels when I put the thing on a breadboard?
All they had to do was to solder the side headers on the other side, or simply not solder them at all, like most electronics manufacturers do. Now that I think of it, this is the only piece of electronics I have bought that comes with headers soldered and they are soldered on the wrong side!
Luckily, with the use of the right tool, it takes 5-10 min to fix it.

Ok, back to the project.
After I verified that I can control a servo with the ESP, all I had to do is to make it listen for a command over the internet and then move the servo so it pushes the power button on the PC.
And here comes the next problem - I can't simply run a small server on the ESP waiting for commands. Because it has to stay in the office, connected to the office WiFi. But there is no way I can convince the admins to open a port for me (well, maybe there is, but its not worth it). So it won't be reachable from the outside. Hmm.
The obvious solution is to have the ESP connect to some server, which I can reach from anywhere. Like my smart home server. That's a plan!

The simplest way would have been to have the ESP send HTTP requests to poll my home server periodically and program my home server to tell it if it needs to do its thing. But that's lame,
So, I enhanced my home server to also accept plain TCP connections, so the ESP connects once and the connection is kept alive. So now, I send an HTTP request from any web browser (e.g. my phone) from anywhere in the world to my home server, then it immediately sends a message to the ESP (over the persistent TCP connection) to activate the servo. No polling, no lag, good stuff.
I had to refactor my home server, because it was using some pretty basic blocking approach (a web server example I downloaded from the net) which was totally unsuitable for the task. Now it uses asynchronous accepts and receives and can handle multiple connections. Fancy stuff. It turns out simple HTTP was easier than good-old-plain-TCP.
As a side effect, now the two instances of my home server (one in the living room and one on the terrace) are now talking over a single persistent TCP connection, instead of bombarding each other with HTTP requests (asking each other what is the temperature in their room every 5 seconds).
Oh, and I finally found a work around  for a nasty bug in WPF font rendering that caused the clock displayed on the living room PC (running XP) to disappear sometimes.

Now, to the ESP part.
Programming it in Lua turned out to be a nightmare. Something so simple took me so long!
The whole idea of using Lua is to make programming simple stuff a breeze. But not in this case.
I had unexpected problems on every single step. And it is pretty difficult to find out why something does not work when you don't have a debugger, error messages are either non-existent (the thing just reboots without saying why) or obscure, like "attempting to index a nil value", Period. Where? Which value? Which file and line? Sometimes it tells you, but often it doesn't. Add to that that there are bugs in the firmware, so code that should work sometimes just doesn't. Are you starting to feel my pain?

But these are not even the biggest issues.
THE biggest issue is "out of freaking memory". All the time. I am so sick of seeing this message.
Lets look at some specs:
The ESP has a whopping 96KB of data RAM (compared to ~1.5K on the arduino). It also runs at 80MHz (compared to the arduino's 16MHz). So, compared to the arduino, it should be a beast. And it is.
But not if you put a NodeMCU firmware on it and run Lua.
Out of these 96K, after the firmware and runtime, you have around 20K free for your program. The thing is, these 20K get eaten very fast. Your code and your runtime data are all supposed to fin in (remember, in Lua code is data). I had a hard time fitting a simple TCP client (around 200 lines of code with lots of blanks). And that is after you precompile some of the code.

Lets talk about precompiling for a second. Normally, you just write some Lua code, send it to the ESP and it executes it. You can organize it in files saved on the ESP's flash. But once you hit the out of memory message (very soon) you can precompile some of the files, so it runs byte code instead of raw source. It also makes files "load" faster, as if it matters.
But now, the process of uploading a program (split into several source files) starts to become annoying. It looks like this:
- Make changes to the source code
- Upload it to the ESP
- Reboot the ESP (otherwise there would be not enough memory to compile it)
- Tell the ESP to compile it
- Run
If your code is split into several files you have to do that for each one of them.
I am not sure which takes longer - this procedure or just writing C++ and flashing the complete firmware every time.
And still, precompiling helps, but not by much. Right now, my simple client runs with about 7K ram free. If you go bellow 5K bad things start to happen.

Anyway, I pretty much made it. The ESP connects to my server, waits for command and ... blinks an LED when instructed :) I use blinking for testing, in order to complete the project I have to replace the LED blink with the servo code (and hope it fits in memory) and I am done. It has been working rock solid for two days now. There are some occasional drops, but I have programmed it so that it keeps trying, even reboots and keeps trying to connect to my home server. All good.

Then I started thinking how it can be made better. I mean, programming in Lua for the ESP in general, not the current project.
What was bothering me is the way you code for it.
On the NodeMCU homepage they put this in the list of features:

Nodejs style network API
Event-driven API for network applicaitons, which faciliates developers writing code running on a 5mm*5mm sized MCU in Nodejs style. Greatly speed up your IOT application developing process.

Well, I have only heard of "Nodejs" without knowing what it actually is. All I know is that is some javascript thing web developers use for something.
But, now I know what event-driven API means in terms of NodeMCU.
Basically, it means this:
con = net.createConnection(net.TCP, 0);
con:on("connection", function() conn:send("TCP ESP8266") end)
con:connect(port, ip)
tmr.alarm(3, 5000, 1, function() con:send("PING mem=" .. node.heap()) end)

instead of this:
con = connect(ip, port, net.TCP) -- block this "thread" until connect finishes
conn:send("TCP ESP8266")
while true do
    sleep(5000)
    con:send("PING mem=" .. node.heap())
end

Now, that is ridiculous!
The whole point of using Lua is so that you can program the second way.
It may be not clear from these simple examples, but programming with callbacks is probably the worst way of doing things. As soon as things start to get a bit more complicated it becomes exponentially more difficult to write the code as well as to read and understand it (i.e. bug fix it and maintain it). Hell, even debugging with a proper debugger (if you have that luxury) is still hard.
The whole beauty of Lua is that you can turn messy call-back spaghetti code into linear easy to write and understand one.
So why on earth did they go through all this trouble porting Lua to the ESP when they ultimately give you THIS?
If you still have some doubts, try porting the following simple piece of code in the "callback" paradigm:
for i=1,5 do
    print(i)
    sleep(1000)
end
It will probably look something like this:
i = 1
print(i)
timer(1000, function()
    i = i + 1
    if (i > 5) then stop_timer() end
    print(i)
end)
Now, NodeMCU can suck my balls. And, if Nodejs is like this ... well, now I know why I have 2 balls :)

Then I though I should do it myself - not suck my balls - provide linear programming style for NodeMCU. After all, they say coroutines work in their firmware, so it can be done.
Except they don't.
I tried.
In general, they work, but not always.
It turns out, you can't coroutine.resume() in a socket event callback, like on("connect")
So, I googled it and found a bug report issued by some guy complaining about the same thing.
And here is the response by someone nicknamed "TerryE" (I don't know for sure, but I am assuming the guys is one of the devs):

the nodeMCU eLua model is event driven. Use the event system to implement concurrency. Coroutining across event threads won't work.

Dear Mr. TerryE,
Both of my balls are now occupied, so you have to suck my dick!

Or, in a more polite manner:

You asshole,

I don't mind your software having bugs (its open source under development, that's understandable),
but telling people they shouldn't try to use language features that MUST work (if you pretend you support the language) just because it is not the programming style you had in mind - that is unacceptable. Especially if it is the worst programming style possible, but that is a matter of another discussion. For the sake of strict bug reporting - there is a language feature not working properly. Either fix it, or put a big disclaimer on some front page explaining why it does not work and cannot be fixed. Or even better, go kill yourself somewhere quietly and have someone else fix it.

I think I am done now.

Oh, by the way, I found a work around that bug and did implement a coroutine-based linear-style TCP client. The problem is, pretty much the same code now runs with around 3K memory left (due to the extra code providing the framework) and does not run stable (due to the low memory and due to other bugs in the firmware).

Ok, I really am done now.

P.S. The new prototype version of the TCP client code (without the framework) looks like this:
function Client(ip, port)
--on("error", function(thread, event, ...) log("Error", unpack(arg)) end)
while true do
local socket
while true do
log("connecting ...")
socket = connect(ip, port, 5000)
if socket then
log("connected")
break
end
log("failed")
sleep(5000)
end
sleep(1000)
log("logging in ... ")
if send(socket, "TCP test", 5000) then
log("success")
sleep(1000)
while true do
log("sending status ...")
if not send(socket, "PING mem=" .. node.heap()) then
break
end
log("sent")
sleep(1000)
log("requesting temp ...")
if not send(socket, "temp", 5000) then
break
end
log("sent")
log("receiving ...")
local res = receive(socket, 10000)
if not res then
break
end
log("received", res)
while true do
res = receive(socket, 0)
if not res then break end
log("received", res)
end
sleep(5000)
end
end
log("failed")
sleep(5000)
end
end