Category Archives: Programming

Object Recognition

The pixel classifier/blob detector and the coordinate mapper modules, described in the previous posts establish a solid base for solving the central task of our vision processing system – object recognition. Due to the color-coding, a large part of this task is already solved by the blob detector. Indeed, every blob of orange color should be regarded as a candidate ball and every blob of blue or yellow – a candidate goal. Two things remain to be done: 1) we need to filter out “spurious” blobs (i.e. those that do not correspond to balls or goals) and 2) we need to extract useful features such as the position of the ball(s) or location of the goalposts.

Issues that need to be solved after blob detection
Issues that need to be solved after blob detection

Finding actual balls and goals among the candidates

Illusory square
Illusory square

So how do we discern a “spurious” orange blob from the one corresponding to a ball. Most contemporary state of the art computer vision methods, which deal with the task of classifying objects on an image, approach the problem by first extracting a number of features and then making the decision based on those. The features can be anything ranging from “average color” to “is this pixel a corner” to “whether there is a green stain in the vicinity”. Interestingly, it is believed that human visual perception works in a similar manner – by first extracting simple local features from the picture and then combining them in order to detect complex patterns. By observing the picture on the right, for example, one can get convinced that the existence of the four local “corner” features on an image is sufficient to strongly mislead the brain into believing that a complete square is present there.

The choice of appropriately selected features is therefore of paramount importance for achieving accurate object recognition. The task is complicated by the fact that we want the computation to be fast. The problem of finding a set of easy to compute features, which are, at the same time, sufficiently informative for recognizing Robotex balls and goals is highly specific to Robotex. Hence, there are no nice and easy “general”, reusable solutions. Those have to be found in a somewhat ad-hoc manner, by taking pictures of the actual balls and goals at various angles and experimenting with different options until a suitable accuracy is achieved. In fact, each of the Tartu teams came up with their own, slightly different take on the object recognition problem and all of them seemed to work well enough. In the following we present our team’s approach and this should be regarded as an example of one of many equivalently reasonable possibilities. In no way is it optimal, but it did well enough in most practical situations and at the competition.

Features for object recognition

For both ball and goal recognition (in other words, for filtering spurious blobs) we used four kinds of features (or filters): blob area, coordinate mapping, neighboring pixels and border detection. Let’s discuss those in order:

Blob area and size provides a simple criteria to filter out unreasonably small or large blobs. For the case of goal detection it is especially relevant, as we know that even when seen from far away and covered by an opponent, the goal must still be visible as a fairly large piece of color. Also, at the very end of our filtering procedure, if we still had several “candidate goals”, we would retain just the largest of the two.

Coordinate mapping is the second most obvious technique. We map the blob’s lower central pixel coordinates using our coordinate mapper to detect its hypothetical coordinates in the real world. If those are not within the playing field, we can safely discard the blob as a false positive. An additional useful test is to map the width and the height of the blob to its corresponding world dimensions. If those do not match the expected size, discard the blob. For example, a ball should not be larger than 10cm in height, even accounting for coordinate mapping errors, and a goal may not be smaller than 5 cm, etc.

The two filters above will do a fairly good job of hiding most of the irrelevant blobs, yet they will be helpless if a feature of an opponent’s robot (such as a red LED or a motor) resembles an actual tiny ball. A nice way for filtering those cases is looking at neighboring pixels. We simply count the proportion of non-green and non-white pixels directly around an orange blob (to be more precise, around the rectangle, encompassing it). If this proportion is suspiciously low, we conclude that this can’t be a valid playing ball.

Pixels probed by the border locator
Pixels probed by the border locator

Finally, we must still be able to filter out spurious ball- and goal-like objects lying outside the field boundaries. A method that we call “border locator” turned out invaluable here. Our border locator procedure starts at the pixel with the world coordinates (0, 0), that is, directly in front of the robot, and then moves in the direction of an object of interest (e.g. a candidate ball), by making steps of a fixed length (e.g. 8cm in world coordinates) and probing the corresponding pixels. If 5 of those pixels in a row happen to be non-green before the target pixel is reached, we conclude that the edge of the field is separating us from the object. The procedure is not perfect – if we are looking at a ball directly along a white line, the border locator will report it to be “outside the borders”. However, such situations are rare enough so we ignored those.

Note that all of the checks described above are very efficient. Indeed, the blob area and coordinate mapping checks require essentially around ten or so comparisons. The scan of neighboring pixels is also fast, because it suffices to check at most a 100 pixels or so, spread out equally along the border. Finally, in the border location procedure with a step size of 8cm we are guaranteed to either reach the object or hit a white border in approximately 5m/0.08 = 62 steps. Consequently, we spend at most 200 steps in total to analyze each candidate blob – way less, typically. Comparing to a single pass over the whole image (which requires examining 640×480 = 307 000 pixels) this is nothing.

Extracting useful features

After we have filtered out the spurious blobs, we do some post-processing. For the balls, we do the following:

  • Check whether at least 75% of the neighboring pixels are white. If so, we mark the ball as being probably “near a wall”. Grabbing such balls is more complicated than others, so our algorithm ignores those in the beginning of the game.
  • Check whether the ball’s blob lies within a goal’s blob. If so, we mark the ball as being “in the goal”, which means we do not need to bother about it. This is not a very precise criteria, of course, but it works surprisingly well in 99% of cases.
  • We also label the blob as being a potential group of several balls (rather than just one ball) if its dimensions suggest so. This is, however, an imprecise test, which was never ever needed in practice.

For the goals, we have to detect the situation where we see the goal at an angle, which means our aiming region can not be bluntly chosen to be the whole blob (see figure above). This particular task has been on of the trickiest of the whole vision processing and after trying numerous (fairly involved) approaches we ended up with the following unexpectedly stupid yet fast and working method:

  • Split the rectangle encompassing the putative goal into two halves, left and right.
  • Sample 200 pixels randomly from each half and count how many of those are green.
  • If the proportion of green pixels in one of the halves is greater than in the other, presume we are viewing the goal at an angle and have to aim at the “greener” half.

Summary and code

The modules of our vision system that we have just discussed actually look as follows:

class BallDetector {
public:
    // Construct a Ball detector object, providing references
    // to all the necessary components
    BallDetector(const CMVProcessor& cmv,
                 const CoordinateMapper& cmapper,
                 const GoalDetector& goalDetector,
                 const WallProximityDetector& wallProximity);

    void processFrame();    // Extract visible balls from frame
    void paint(QPainter* painter) const; // Paint for debugging

    QList<Ball> balls;      // Detected balls

protected:
    // Main filtering method:
    // Given a blob rectangle, checks whether it contains a ball
    Ball checkBall(const QRect& rect, int area) const;

    // ... some details omitted ... //
};
class GoalDetector {
public:
    GoalDetector(const CMVProcessor& cmv,
                 const CoordinateMapper& cmapper,
                 const WallProximityDetector& wallProximity);

    void processFrame();                 // Extract visible goals
    void paint(QPainter* painter) const; // Paint for debugging

    Goal yellow;    // Detected yellow goal (if any)
    Goal blue;      // Detected blue goal (if any)

protected:
    // Main method:
    // Given a rect, checks whether it is a valid goal
    Goal checkGoal(const QRect& rect, bool yellow, int area);

    // ... some details omitted ... //
};
class BorderLocator {
public:
    BorderLocator(const CMVProcessor& cmv,
                  const CoordinateMapper& cmapper);

    // Shoots a probe at a specific point in world coordinates
    // Returns a BorderProbe object with shot results
    BorderProbe shootAtPoint(QPointF worldPoint) const;

protected:
    // ... omitted ... //
};

An attentive reader will notice the WallProximityDetector module mentioned in the code. We shall come to it in a later post.

Coordinate Mapping

Knowing that a certain orange blob on the picture corresponds to a ball does not help us much unless we know how to drive in order to reach it. In order to be able to do that, we must compute where exactly this ball is located with respect to the robot. Thus, the second important basic component of our vision subsystem (after the color recognition/blob detection module) is the coordinate mapper.

Coordinate mapping
Coordinate mapping

We shall assign coordinates to the points of the field. Coordinates will be local to the robot (we shall therefore refer to them as “robot frame coordinates“). That means, the origin of the coordinate system (the point (0, 0), red on the image above) will will be fixed directly in front of the robot. The point with coordinates (300, 200) will be 300mm to the right and 200mm to the front (the blue point on the image above), etc.

With the coordinate system fixed, each pixel on the camera frame uniquely corresponds to a point on the field with particular coordinates. The task of the coordinate mapper is to convert from pixel coordinates to robot frame coordinates and vice-versa.

Camera projection

Determining distance to a point
Determining distance to a point

How do we perform the conversion? Let’s start with the distance, i.e. the Y coordinate. If you examine the typical view from our robot’s camera, you will easily note that the vertical coordinate of a pixel uniquely determines its distance from the robot. Points on the “horizon line” lie at infinite distance, and become closer as you move lower along the picture. The relation between the pixel’s vertical coordinate (counted from the horizon line) and actual distance to it is an inverse function:

ActualDistance = A + B/PixelVerticalCoord,

where A and B are some constants.

I will not bore you with the proof of this fact. If you are really interested, however, you should be able to come up with it on your own after reading a bit about the pinhole camera model, the perspective projection, and meditating on the following figure (here, p is the pixel vertical coordinate as counted from the horizon line and d is the actual distance on the ground).

Recovering distance from perspective projection
Recovering distance from perspective projection

Finding the X coordinate (i.e. the distance “to the right”) is even easier. Note that as you approach horizon, the pixel-width of a, say, 100mm segment, decreases linearly.

Pixel width decreases linearly with pixel vertical coordinate
Pixel width decreases linearly with pixel vertical coordinate

That means that the relation between a pixel’s horizontal coordinate and the corresponding point’s coordinate on the ground must be of the form:

ActualRight = C * PixelRight / PixelVerticalCoord

where C is some constant again.

Finding the constants

Calibrating the coordinate mapper
Calibrating the coordinate mapper

Once we’re done with the math, we need to find the constants A, B, C as well as the pixel coordinate of the horizon line to make the formulas work in practice. Those constants depend on the orientation of the camera with respect to the ground, i.e. the way the camera is attached to the robot. For robots that have their cameras rigidly fixed it is typically possible to compute those values once and forget about them. Telliskivi’s camera, however, is not rigidly fixed, because the smartphone can be taken out from the mount. Thus, every time the phone is repositioned (or simply nudged hard enough to get displaced), we need to recalibrate the coordinate mapper by computing the A, B and C values, corresponding to the new phone orientation.

To do the calibration, we need to “label” some pixels on the screen with their actual coordinates in the robot frame. The easiest approach for that is to lay out a checkerboard pattern (or any other easily detectable pattern) printed on a piece of paper in front of the robot. After that, a simple corner detection algorithm can locate the pixels, which correspond to the four corners of the pattern. As the dimensions of the pattern are known and it is laid out at a fixed distance from the robot’s front edge, the actual coordinates, corresponding to the corner pixels are also known. Hence, putting those values into the equations above lets us compute the suitable values for the A, B and C constants and this completes the calibration.

On coordinate mapping precision

It is worth keeping in mind that due to the properties of perspective projection, coordinates of faraway objects computed using the formulas presented above can be rather imprecise. Indeed, for objects further than 3-4 meters, an error by a single pixel can correspond to a distance error of more than 20 cm. Such pixel errors, however, happen fairly often. For example, a shadow may confuse the blob detector to not include the lower part of a ball into the blob. Alternatively, the robot itself may tilt or vibrate due to the irregularities of the ground – this shakes the camera and shifts the whole picture by a couple of pixels back and forth. Luckily, this discrepancy is usually not an issue as long as our main goal is chase balls and kick them into goals. If better precision for faraway objects were necessary, however, it would be possible to achieve by carefully tracking the objects over time and averaging the measurements.

Summary

To summarize, this is how the actual declaration of our coordinate mapping module looks like (as usual, with a couple of irrelevant simplifications):

class CoordinateMapper {
private:
    // Mapping parameters we've discussed above
    qreal A, B, C;
    // Screen coordinates of the midpoint of the horizon line
    QPointF horizonMidPoint;
public:
    // Load parameters from file / Save to file
    void load(const QString& filename);
    void save(const QString& filename) const;

    // Initialize parameters from the four pixel coordinates
    // of a predefined pattern lying in front of the robot.
    bool fromCheckerboardPattern(QPoint bottomLeft, QPoint topLeft, 
                                 QPoint bottomRight, QPoint topRight);

    // The main conversion routines
    QPointF toRobotFrame(const QPointF& screen) const;
    QPointF toScreen(const QPointF& robotFrame) const;

    // Paints a grid of points visualizing current mapping
    // (you've seen it on the pictures above).
    // Useful for debugging.
    void paint(QPainter* painter) const;
};

In addition, we also have a CheckerboardDetector helper module of the following kind:

class CheckerboardDetector {
public:
    void init(QSize size);                 // Initialize module
    void processFrame(const uyvy* frame);  // Detect pattern
    void paint(QPainter* painter) const;   // Debug painting

    bool failed() const;                   // Was detection successful?
    const QPoint& corner(CornerId which) const; // Detected corners
    enum CornerId { TOPLEFT, TOPRIGHT, BOTTOMLEFT, BOTTOMRIGHT };
private:
    // ... omitted ...
}

Pixel Classification and Blob Detection

The task of the vision processing module is to detect various objects on the camera frame. In its most general form, this task of computer vision is quite tricky and is still an active area of research. Luckily, we do not need a general-purpose computer vision system for the Robotex robot. Our task is simpler because the set of objects that we will be recognizing is very limited – we are primarily interested in the balls and the goals. In addition, the objects are color-coded: the balls are orange, the goals are blue and yellow, the playing field is green with white lines, and the opponent is not allowed to color significant portions of itself into any of those colors.

Robotex color coding
Robotex color coding

Making good use of the color information is of paramount importance for implementing a fast Robotex vision processor. Thus, the first step in our vision processing pipeline takes in a camera frame and decides, for each pixel, whether the pixel is “orange“, “white“, “green“, “blue“, “yellow” or something else.

Recognizing colors

Firstly, let me briefly remind you, that each pixel of a camera frame represents its actual color using three numbers – the color’s coordinates in a particular color space. The most well-known color space is RGB. Pixels in the RGB color space are represented as a mixture of “red”, “green” and “blue” components, with each component given a particular weight. Pixel (1, 0, 0) in the RGB space corresponds to “pure bright red”. Pixel (0, 0.5, 0) – “half-bright green”, and so on.

Most cameras internally use a different color space – YUV. In this color space, the first pixel component (“Y”) corresponds to the overall brightness, and the two last components (“UV”) code the hue. The particular choice of the color space is not too important, however. What is important is to understand that our color recognition step needs to take each pixel’s YUV color code and determine which of the five “important” colors (orange, yellow, blue, green or white) it resembles.

Pixel color classification
Pixel color classification

There is a number of fairly obvious techniques one might use to encode such a classification. In our case we used the so-called “box” classifier, due to the fact that it is fast and its implementation was available. The idea is simple: for each target color, we specify the minimum and maximum values of the Y, U and V coordinates that a pixel must have in order to be classified into such target color. For example, we might say that:

Orange pixels:  (30, 50, 120) <= (Y, U, V) <= (160, 120, 225)
Yellow pixels: (103, 20, 130) <= (Y, U, V) <= (200, 75, 170)
... etc ...

How do we find the proper “boxes” for each target color? This task is trickier than it seems. Firstly, due to different lighting conditions the same orange ball may have different pixel colors on the frame.

The same balls on the same field under different lighting conditions
The same balls on the same field under different lighting conditions

Secondly, even for fixed lighting conditions, the camera’s automatic color temperature adjustment control may sometimes drift temporarily, resulting in pixel colors changing in a similar manner. Thirdly, shadows and reflections influence visible color: as you might note on the picture above, the top of the golf ball has some pixels that are purely white, and the bottom part may have some black pixels due to the shadow. Finally, rapid movements of the robot (rotations, primarily) make the picture blurry and due to this, the orange color of the balls may get mixed with the background, resulting in something not-truly-orange anymore.

Rotating robot's view
Rotating robot's view

Consequently, the color classifier has to be calibrated for the specific lighting conditions. Such calibration can, in principle, be made automatically by showing the robot a printed page with a set of reference colors and having it adjust its pixel classifier in accordance. For the Telliskivi project we did not, unfortunately, have the time to implement such calibration reliably, and instead used a simple manual tool for tuning the parameters. Thus, whenever lighting conditions changed, we had to take some pictures of the playing field and then play with the numbers a bit to achieve satisfactory results. This did get somewhat annoying by the end.

Our tool for tuning the pixel classifier
Our tool for tuning the pixel classifier

After we have found the suitable parameters, implementing the pixel classification algorithm is as easy as writing a single for-loop with a couple of if-statements. It is, however, possible, to implement this classification especially efficiently using clever bit-manipulation tricks. Best of all, such algorithm has been implemented in an open-source (GPL) library called CMVision. The algorithm and the inner workings of the library are well-described in a thesis by its author, J. Bruce. It is a worthy reading, if you ever plan on using the library or implementing a similar method.

Blob detection

Once we have classified each pixel into one of the five colors, we need to detect connected groups (“blobs”) of same colored-pixels. In particular, orange blobs will be our candidate balls and blue and yellow blobs will be candidate goals.

Orange blobs highlighted
Orange blobs highlighted

An algorithm for such blob detection is not trivial enough for me to go into describing it here, but it is no rocket science – anyone who has done an “Algorithms” course should be capable of coming up with one. Fortunately, the CMVision library already implements an efficient blob detector (look into the above mentioned thesis for more details).

Why not OpenCV?

Some of you might have heard the name of OpenCV – an open-source state of the art computer vision algorithm library. It is widely used in robotics and several Robotex competitors did use this library for their robots. I have a strong feeling, however, that for the purposes of Robotex soccer this is not the best choice. OpenCV is primarily aimed at “more complex” and “general purpose” vision processing tasks. As a result, most of its algorithms are either not enormously useful for our purposes (such as contour detection and object tracking), or are too general and thus somewhat inefficient. In particular, the use of OpenCV would impose a pipeline of image filters, where each filter would require a full pass over all pixels of the camera frame. This would be a rather inefficient solution (we know it from the fellow teams’ experience). As you shall see in the later posts, all of our actual object recognition routines can be implemented much more efficiently without the need to perform multiple full passes over the image.

Summary

We have just presented you the idea behind the first module of our vision processing system. The module is responsible for recognizing the colors of the pixels in the frame and detecting blobs. In our code the module is implemented as (approximately) the following C++ class, which simply wraps the functionality of the CMVision library.

class CMVProcessor {
public:
  CMVision cmvision; // Instance of the cmvision class

  CMVProcessor(const PixelClassifierSettings& settings);
  void init(QSize size);          // Initialize cmvision
  void processFrame(uyvy* frame); // Invoke cmvision.processFrame()
  void paint(QPainter* painter) const; // Paint the result (for debugging)
};

Telliskivi’s Brain: Overview

Now that we are done with the hardware details, let us move to the “brain” of the Telliskivi robot – the software, running on the smartphone. By now you can safely forget everything you read (if you did) about the hardware, and only keep in mind that Telliskivi is a two-wheeled robot with a coilgun and a ball sensor, that can communicate over Bluetooth.

You should also know that Telliskivi’s platform understands the following simple set of textual commands:

  • speeds <x> <y>   – sets the PID speed setpoints for the two motors. In particular, “speeds 0 0” means “stop moving”, “speeds 100 100” means “move forward at a maximum speed”, “speeds 10 -10” means “turn clockwise on the spot”,
  • charge – enables the charging of the coilgun capacitor,
  • shoot – shoots the coilgun,
  • discharge – gracefully discharges the coilgun capacitor,
  • sense – returns 1 if the ball is in the detector and 0 otherwise.
    (*Actually, things are just a tiny bit more complicated, but it is not important here).

We now add a smartphone to control this plaform. The phone will use its camera to observe the surroundings and will communicate with the platform telling it where to go and when to shoot in order to win at a Robotex Soccer game.

The software that helps Telliskivi to achieve this goal is structured as follows:

Telliskivi's brain
Telliskivi's brain

The Robot Controller

The Robot Controller is a module (a C++ class) which hides the details of Bluetooth communication (i.e. the bluez library and the socket API). The class has methods which correspond to the Bluetooth commands mentioned above, i.e. “speeds(a,b)“, “charge()“, “shoot()” and “sense()” and “discharge()“. In this way the rest of the system does not have to know anything at all how exactly the robot is controlled.

For example, a simple change in this class lets us use the same software to control a Lego NXT platform instead of Telliskivi. That platform does not have a coilgun nor a ball sensor (hence the shoot(), sense() and discard() methods do not do anything), but it can move in the same way, so if we let our soccer software run with the NXT robot, the robot still manages to imitate playing soccer – it would approach balls and desperately try to push them towards the goal. Looks funny and makes you wonder whether it is polite to laugh at physically disabled robots.

Obviously, the robot controller is the first thing we implemented. We did it even before we had Telliskivi available (we could use the Lego NXT prototype at that time).

Graphical User Interface (GUI)

The GUI is the visual interface of the smartphone app. Ours is written in QML, a HTML-like language for describing user-interfaces. Together with the Qt framework this is the recommended way of making user applications for Nokia N9. It takes time to get used to, but once you grasp it, it is fairly straightforward.

The overall concept of our UI is not worth delving deeply into – it is just a bunch of screens organized in a hierarchical manner. Most screens are meant for debugging – checking whether the Bluetooth connection works, whether the robot controller acts appropriately, whether the vision system detects objects correctly, and whether the various behaviours are behaving as expected.

In addition, there is one “main competition” screen with a large “run” button, that invokes the Robotex soccer mode and two “remote control” screens. One allows to use the phone as a remote control for the Telliskivi platform and steer it by tilting the phone (this is fun!). Another screen is meant to be used as a VNC server (so that you can log in to the robot remotely over the network from you computer, view the camera image and drive around – even more fun). For the curious, here are some screenshots:

Telliskivi smartphone app UI screenshots
Telliskivi smartphone app UI screenshots
Telliskivi smartphone app UI screenshots
Telliskivi smartphone app UI screenshots

Vision Processing

The vision processing sybsystem is responsible for grabbing the frames from the camera (using QtMultimedia), and extracting all the necessary visual information – detecting balls, goals and walls. In our case, parts of the vision subsystem were also responsible for tracking the robot’s position relative to the field. Thus, it is a fairly complex module with multiple parts and we shall cover those in more detail in later posts.

Vision module
Vision processing

Behaviour Control

The last part is responsible for processing vision information, making decisions based on it, and converting them into actual movement commands for the robot – we call it the “behaviour controller”.

As the camera was the main sensor for our robot, we had the behaviour controller synchronized with the camera frame events. That is, the behaviour controller’s main method, (called “tick”), was invoked on each camera frame (which means, approximately 30 times per second). This method would examine the new information from the vision subsystem and act in correspondence to its current “goal”, perhaps changing the goal for the next tick if necessary. This can be written down schematically as the following algorithm:

on every camera frame do {
   visionProcessor.processFrame();
   currentGoal = behave(currentGoal, visionProcessor, robotController);
}

In a later post we shall see how depending on the choice of representation of the “goal”, this generic approach can result in behaviours ranging from a simple memoryless single-reflex robot, to somewhat more complex state machines up to the more sophisticated solutions, suitable for the actual soccer-playing algorithm.

The ROS Alternative

As you might guess, all of our robot’s software was written by us from scratch. This is a consequence of us being new to the platform, the platform being new to the world (there is not too much robotics-related software pre-packaged for N9 out there yet), and the desire to learn and invent on our own. However, it does not mean that everyone has to write things from scratch every time. Of course, there is some good robotics software out there to be reused. Perhaps the most popular system that is worth knowing about is ROS (“Robot OS”).

Despite the name, ROS is not an operating system. It is a set of linux programs and libraries, providing a framework for easy development of robot “brains”. It has a number of useful ready-made modules and lets you add your own easily. There are modules for sensor access, visualization, basic image processing, localization and control for some of the popular robotic platforms. In addition, ROS provides a well-designed system for establishing asynchronous communications between the various modules: each module can run in a separate process and publish events in a “topic”, to which other modules may dynamically “subscribe”.

Note that such an asynchronous system is different from the simpler Telliskivi approach. As you could hopefully understand from the descriptions above, in the Telliskivi solution, the various parts are fairly strictly structured. They all run in a single process, and operate in a mostly synchronized fashion. That is, every camera frame triggers the behaviour module, which, in turn, invokes the vision processing and sends commands to the robot controller. The next frame will trigger the same procedure again, etc. This makes the whole system fairly easy to understand, develop and debug.

For robots that are more complicated than Telliskivi in their set of sensors and behaviours, such a solution might not always be appropriate. Firstly, different sensors might supply their data at different rates. Secondly, having several CPU cores requires you to run the code in multiple parallel processes if you want to make good use of your computing power. Even for a simpler robot, using ROS may be very convenient. In fact, several Robotex teams did use it quite successfully.

In any case, though, independently of whether the robot’s modules communicate in a synchronous or asynchronous mode, whether they are parts of a framework like ROS or simple custom-made C++ classes, whether they run on a laptop or a smartphone, the overall structure of a typical soccer robot’s brain will still be the one shown above. It will consist of the Vision Processor, the Behaviour Processor, the Robot Controller and the GUI.

Summary

To provide the final high-level overview, the diagram below depicts all of the programming that we had to do for the Telliskivi project. This comprises:

  • about 800 lines of C code for AVR microcontrollers,
  • about 7000 lines of C++ code for the Vision/Behaviour code
  • about 2000 lines of QML code for the smartphone UI elements.
  • about 900 lines of Python code for the simulator
  • about 500 lines of C++ code for the Vision test application (for the Desktop)
Telliskivi software
Telliskivi software

Bluetooth

OK, so we have made the chassis with a couple of motors, a coilgun, a ball detector, and the electronic circuitry to control all of this. This makes up the “body” of our robot and what remains is the “brain”. As you should know already, the brain in our case is a Nokia smartphone. The brain needs to be constantly communicating with the body – sending movement and shooting commands and reading ball sensor and perhaps motor sensor data. As we explained in one of the first posts, we need to use the Bluetooth protocol for such communication.

We thus had to purchase a separate module and connect it to our main microcontroller (attentive readers have already seen this module in one of the pictures). There is a fairly wide variety of Bluetooth modules available out there. Our criteria for choosing a suitable one were the following:

  • The module must have an integrated implementation of the full Bluetooth protocol stack. Bluetooth is a fairly complicated set of protocols, and there are some modules which only implement parts of it. The remaining parts would have to be implemented in software (i.e. in the microcontroller) and this would make our life difficult.
  • The module must use the UART protocol. This is the default serial protocol supported by our microcontroller. Most Bluetooth modules use it anyway.
  • The module must have a readable datasheet. Debugging a no-name module with no reference manual is not something we look for.
  • The module should be reasonably priced. Although ELFA is certainly not a place to look for “reasonably priced” things, we can use the knowledge that the cheapest module there (which happens to fit the previous criteria) costs €32.50.
  • Finally, the module might have some positive reviews on the internet.
BlueSMIRF Silver
BlueSMIRF Silver

SparkFun’s BlueSMIRF Silver happened to fit all of them (especially the last one), so we ordered it and so far it has been one of the less troublesome parts of our robot. Although it does have a detailed datasheet, pretty much none of that information was even needed, because the module does most of the work itself. All the microcontroller has to do is read and write bytes over the UART pins, to which the module is attached.

To read and write over UART we borrowed the code from the Teensy UART library and added a couple of simple wrappers around the low-level uart_getchar and uart_putchar methods. After this was done, we could complete the main loop of our robot’s microcontroller:

int main() {
    // ... initialization code ...

    char buf[32];

    while (true) {
        bool success = recv_command(buf, sizeof(buf));
        if (success) execute_command(buf);
    }
}

This function, together with the interrupt handlers mentioned in the post about motor control, is essentially all there is to our main microcontroller’s code.

Serial Communication and Interrupts

If you will be implementing a similar solution, beware of one particular hidden reef. The UART communication in Atmega uses interrupts to receive bytes. That is, every time a new byte arrives on the serial port, an interrupt handler is invoked. The code in this interrupt takes responsibility of storing the received byte into a buffer for further processing. While this interrupt is in process, no other interrupts will be invoked. For example, if during the reading of a byte a motor encoder happens to send a pulse, this pulse would not get counted, simply because the controller won’t be there waiting to catch it.

This issue is even worse the other way around – if the microcontroller happens to stay “too long” in one of the other interrupts, he might simply miss an incoming byte. As a result, weird things will start happening – the command “move” might get received as “mve” or worse still, a command separator character will be missed and two commands will be concatenated into one.

There is really no good solution to this problem besides being careful and not writing code which might stay too long in an interrupt handler. At one moment we had an interrupt handler like that (the one which performed the PID computation in a timer), and it resulted in a fairly large number of communication errors. Surprisingly, the problem nearly disappeared after we rewrote the computation from 32-bit integers to 16-bit integers. It turns out that an 8-bit microcontroller can spend too much time simply adding and multiplying 32-bit numbers.

Bluetooth and Latency

A final word of warning is related to latency. Although the communication speed for a Bluetooth connection is quite good (in our case it was actually limited by the UART’s 115 kbps), the latency is not. That is, although you might easily send up to 15 000 single-character commands per second on a UART+Bluetooth connection, if you require (and wait for) a response to each command, the actual throughput can be somewhere around 30-60 commands per second.

In fact, this particular problem (and the fact that we discovered it too late to go fixing) has been one of the main reasons why our robot was somewhat “slow” during aiming in the final competition. We focused a lot on keeping the vision processing speed at 30 frames per second as this was, according to general knowledge, the hardest part to get working fast. Unknowingly, we were at the same time unreasonably wasteful in communication. Whenever robot would have the ball in the dribbler, we would query the ball sensor too often. As we would also wait for sensor query command to respond before proceeding with computation, this (rather than the dreaded vision processing!), was the reason our actual processing speed went down to about 15-20 fps every time the robot was holding the ball. Those were, however, exactly the moments when the robot was aiming for the goal and needed as high a framerate as possible.

Operating the Coilgun

Now that we’ve figured out how to operate a motor, let us describe the second important electro-mechanical component of the robot – the coilgun. Previously, we have already discussed briefly the general idea and its mechanical realization. What remains are the minor implementation details.

Recollect that the general scheme of a coilgun (somewhat simplified, of course) is the following:

Coilgun
Coilgun

In order to make this scheme operable, we must replace the switches “Enable charge” and “Enable discharge” with transistors, connected to a microcontroller. In our case we had a separate microcontroller operating the coilgun, which means that we also needed to establish communication between the main controller and the coilgun controller. The resulting scheme is then, in principle, the following:

Coilgun control
Coilgun control

The task of the coilgun microcontroller is rather simple: receive signals from the main microcontroller and set its output pins A and B accordingly. Our robot’s coilgun controller implemented essentially the following four functions:

void enableCharge() {
    set_pin(PIN_A, 1); // Enable charging of the capacitor
}

void disableCharge() {
    set_pin(PIN_A, 0); // Disable charging of the capacitor
}

void kick() {
    set_pin(PIN_B, 1); // Start discharging capacitor
    delay_ms(5);       // Wait 5ms
    set_pin(PIN_B, 0); // Stop discharge
}

void dischargeSlowly() {
    for (int i = 0; i < 20; i++) {
       set_pin(PIN_B, 1);
       delay_ms(1);       // Discharge a bit
       set_pin(PIN_B, 0); // Let the kicker be pulled back
       delay_ms(5);
    }
}

The last function (dischargeSlowly) is there to allow a graceful shutdown of the robot, where the capacitor discharges without the kicker having to jerk hard.

The communication between the main microcontroller and the coilgun may be implemented using various protocols. Originally, our coilgun PCB was meant to be controlled via the SPI protocol, but during one of the mishaps the corresponding pins burned down and we ended up using a simpler solution, where three output pins of the main controller were directly connected to the input pins of the coilgun board. A command was sent by specifying its code using the first two pins and sending a pulse on the third one.

If you will ever be making your own coilgun…

It must be noted that the coilgun was perhaps the most complicated electromechanical part of our robot and as this blog tries to make things sound as simple as possible, it sweeps some details under the carpet. If you, dear reader, for some reason, will be going to build a coilgun yourself (perhaps when taking part in one of the next Robotex/Robocups), take your time to skim through the following sites first:

Safety notes

In order for the coilgun to be capable of hitting the ball with appropriate force, it must contain a reasonably large capacitor, and large capacitors can be dangerous. A 1.5mF capacitor, when charged to 250V, stores 0.0015·250² = 93.75 Joules of energy – something comparable to an energy of a 10kg brick falling from 1 meter’s height. If such a capacitor short-circuits, all this energy can get released in a single explosion, and you do not want your fingers to be nearby. Even if your fingers are spared, your electronics can get irreparably damaged.

In order to avoid a capacitor explosion (there were at least three of those among the Tartu teams during the two months leading to Robotex), keep in mind the following:

  • You should not short-circuit a capacitor. Obviusly, this will result in an immediate discharge of all its energy, which means an explosion. Of course, no one ever short-circuits a capacitor deliberately, but it is unexpectedly easy to do it accidentally.
  • You should never physically damage a capacitor. A damaged capacitor may short-circuit internally and explode for no apparent reason later on. Hence, if a capacitor falls on concrete floor once, beware when reusing it.
  • A capacitor must be used with appropriate voltages and connected in accordance to its polarity. Invalid voltage or polarity may damage it internally, which will lead to internal short-circuit.

To be more specific, here are some educational stories:

  • Some guys were debugging their PCB using a digital oscilloscope and touched a “positive” end of the capacitor with a “ground” probe of the oscilloscope. As a result, there was a short circuit through the ground, boom.
  • Other guys were trying to fix something using a screwdriver right on a working robot with a charged capacitor. Screwdriver touches the metal cover of the robot (which most probably was connected to the “ground”. of the electronics somewhere). Next, either due to a static discharge from the screwdriver, or due to its external potential acting to inversely polarize the capacitor, the boom happens.
    In fact, touching a working robot with a metal object is a bad idea for many more reasons. The two times when our team has managed to mess up the electronics were both related to someone trying to touch a working robot with a metal object. So beware.
  • Yet other guys have attached a capacitor to their robot using just Velcro tape. The robot starts spinning, the tape does not hold, the capacitor slips off and touches some of the electronics with its terminals. No boom, but all of the electronics burned down.

 

Operating a Motor

After we have chosen the motors and motor drivers, we must connect them to a microcontroller and control their rotation. How do we do that?

The Microcontroller

In our particular case we used a microcontroller chip mounted on a custom-printed circuit board together with the motor drivers, which we ended up not using anyway. Thus, our life would have actually been easier if, instead of using that custom board, we would just buy some well-known pre-made microcontroller board. The popular choices included Arduino, Teensy, Baby Orangutan, Basic STAMP, ARM mbed, TI LaunchPad, ST Discovery, and anything else you might google up using the keyword “microcontroller development board“. Hence, for simplicity, I shall avoid details specific to our case and assume the set-up with such a separate board.

For those of you not familiar with microcontrollers, it is enough to understand that a microcontroller is a just a black-box chip with several pins (those are the metal “legs” of the chip).The microcontroller can be programmed to set output voltage on some of the pins to certain values (usually either 0V, corresponding to “bit zero” or 5V, corresponding to “bit one”). It is also possible to sense input voltage on some other pins. Those two basic operations allow the microcontroller to communicate with other devices connected to it.

So, we have to connect some output pins of the microcontroller to the motor driver. Then, by setting appropriate output voltages on those pins, we shall be telling the motor driver to operate the motor as necessary. How exactly should the motor driver chip be connected and operated is specific to each chip, but in principle the assembly for one motor might look as follows:

Motor connection
Motor connection

Controlling the Motor

In the example above (which corresponds to the driver we used), two pins (A and B) are used to tell the driver in which direction to rotate the motor. The whole code for specifying rotation direction is thus as simple as:

if (rotate_forward) {
    set_pin(PIN_A, 1);
    set_pin(PIN_B, 0);
}
else {
    set_pin(PIN_A, 0);
    set_pin(PIN_B, 1);
}

Specifying rotation speed is just a bit trickier. For that we must produce a pulse width modulation (PWM) signal on pin X. This simply means quickly switching pin’s values between 0 and 1, so that on average the value “1” is kept a given fraction of time. A “full PWM” would mean keeping pin X at “1” all the time – in this case the motors would rotate with  their maximum speed. A “null-PWM” means keeping “0” all the time. This corresponds to motors not rotating. All values inbetween are also possible. If pin X is at value 1 one third of a time, the motors will rotate with approximately a third of their maximum speed.

PWM with 1/3 duty cycle
PWM with 1/3 duty cycle

Most microcontrollers have built-in mechanisms for producing such a signal on some of the pins. We won’t delve into details of doing that – there are lots of tutorials out there already. For all practical purposes, the actual code for setting the motor speed boils down to

set_pwm_pin(PIN_X, speed); // speed = 0..255

Feedback

Are we done? No. Unfortunately, simply setting the motor rotation speed to a fixed number is not enough for precise control. For example, if we set the PWM speed inputs for both robot motors to the same value, despite all expectations, the robot will most probably not drive perfectly straight. Thus, to ensure exact control, we must constantly measure the actual rotation speed of the motor and adjust the PWM to keep the speed at a given value.

In order to measure motor rotation speed, our motor is equipped with a rotary encoder: a sensor, which generates pulses in accordance with the rotation. The faster the rotation – the more pulses are generated per second. To obtain this feedback information we first connect the motor encoder to an input pin of a microcontroller:

Motor with feedback
Motor with feedback

Next, we write a simple function that simply counts all pulses that come in on pin E:

ISR(INT0_vect) {
    pulse_count++;
}

Finally, we set up a timer interrupt that will regularly examine the current pulse_count value, compare it with some target value, and adaptively update the PWM signal on Pin X so that the target rotation speed is achieved:

ISR(TIMER0_COMPA_vect) {
    current_pulse_count = pulse_count;
    pulse_count = 0; // Reset pulse counter

    // Update pulse counter
    update_motor_speed(target_pulse_count, current_pulse_count);
}

PID controller

So how do we use the feedback and choose the appropriate PWM value to set for the motor speed in order to keep up with the target pulse count? This simple question has a whole discipline dedicated to answer it, known as control theory. The easiest answer that control theory provides us here is the PID controller.

The idea of a PID controller is generic and simple: we first measure the discrepancy between the target pulse count and the actual measured pulse count (the error):

error = (target_pulse_count - current_pulse_count);

In addition, we keep track of the error, accumulated over multiple iterations of the algorithm, and the difference between the error of the previous iteration and this iteration:

error_i = error_i + error;
error_d = error - prev_error;

The actual input to the motor driver is now computed as a linear combination of these three error values with some experimentally determined coefficients P, I and D:

new_pwm = P*error + I*error_i + D*error_d;

If the parameters are chosen well, the PID controller will magically keep the motor turning at the necessary speed.

So, to summarize the above, the update_motor_speed function may look approximately as follows:

void update_motor_speed(int target, int current) {
    error = (target - current);
    error_i = error_i + error;
    // Prevent the integral term from growing too large
    if (error_i > max_error_i) error_i = max_error_i;
    if (error_i < min_error_i) error_i = min_error_i;
    error_d = error - prev_error;
    prev_error = error;
    new_pwm = P*error + I*error_i + D*error_d;
    set_pwm_pin(PIN_X, new_pwm);
}

Our actual function was somewhat more complicated and included a couple of specific hacks and improvements, but the gist is still there.

Summary

To summarize, here is what you need to do to properly operate a motor:

  • Connect the microcontroller’s output pins to a motor driver, and the motor driver to a motor.
  • Use pins “A” and “B” (or whatever the motor driver specification requires) to set motor rotation direction.
  • Connect the motor’s rotary encoder sensor to an input pin “E” of a microcontroller.
  • Make sure the microcontroller is counting the pulses on pin “E”.
  • Set up a timer, that reads the counted pulses and updates the PWM parameters on pin “X” to match the “target_pulse_count“.

The same logic applies for the second motor.

But where do the motor direction and values of target_pulse_count come from? Those are specified by the higher logic, running in the phone, of course. We’ll get to that eventually.

 

Setting up the Camera

 

Camera and image processing are crucial components of a successful soccer robot. Consequently, before we could even start to build our robot we had to make sure the camera of the N9 won’t bring any unexpected surprises. In particular, the important questions were:

  • Is the angle of view of the camera reasonably wide? Can we position the camera to see the whole field?
  • What about camera resolution. If we position a ball at the far end of the field (~5 meters away) will it still be discernible (i.e. at least 3-4 pixels in size)?
  • Can it happen that the frames are too blurry when the robot moves?
  • At what frame rate is it possible to receive and process frames?

Answering those questions is a matter of several simple checks. Here’s how it went back then.

Resolution and Angle of View

The camera at N9 is capable of providing video at a framerate of about 30Hz with different resolutions, starting from 320×240 up to 1280×720. Among those, there three options which make sense for fast video processing: 320×240, 640×480 and 848×480. The first two are essentially equivalent (one is just twice the size of the other). The third option differs in terms of aspect ratio, and its horizontal and vertical angles of view. The difference is illustrated by the picture below, which shows a measure tape shot from a distance of 10cm.

Different angles of view
Different angles of view

We can see that the resolution 848×480 provides just a slightly larger vertical angle of view than 640×480 (102mm vs 97mm) at the price of significantly reduced horizontal angle of view (65mm vs 86mm). Consequently, we decided to stick with the 640×480 resolution.

Camera positioning
Phone mounting angle

From the picture we can also estimate the angle of view, which is 2*arctan(97/200) ~ 52 degrees vertical and 2*arctan(86/200) ~ 46.5 degrees horizontal. Repeating this crude measurement produced somewhat varying results, with the horizontal angle being as low as 40 and the vertical as large as 60 degrees.

Knowledge that the vertical angle of view is 60 degrees suggested that the phone should also be mounted at around 60 degrees – this provided the full view of the field. As we also needed to see the ball in front of the robot, we had to mount the phone somewhat to the back.

Image Processing Speed

The first code we implemented was just reading camera frames and drawing them on the screen. The code could run nicely at 30 frames per second. Additional simple image operations, such as classifying pixels by colors also worked fine at this rate. Something more complicated and requiring multiple passes over the image, however, could easily drag the framerate down to 20 or 10 fps, hence we knew early on that we had to be careful here. So far it seems that we managed to keep our image processing fast enough to be able to work at 25-30 fps, but this is a topic of a future post.

Camera Speed

One reason why the Playstation 3 Eye camera is popular among Robotex teams is that it can produce 120 frames per second. And it is not the framerate itself, which is important (it is fairly hard to do image processing at this rate even on the fastest CPUs). The important part is that the frames are shot faster and thus do not blur as much when the robot moves. So what about our 30 fps camera? Can it be so blurry as to be impractical? We used our NXT prototype robot (at the time, we did not have our “real” robot, not even as a 3D model) and filmed its view as it drove forward (at 0.4 m/s) or rotated (at about 0.7 revolutions per second). The result is shown below:

Moving forward
Moving forward
Rotating
Rotating

The results are quite enlightening. Firstly, we see that there is no blurring problems with the forward movement. What concerns rotation, however, it is indeed true that even for a moderate rotation speed, anything further away than 50cm or so blurs to be indistinguishable. It is easy to understand, however, that this is not so much a limitation of a 30fps camera but rather a property of rotation itself. At just one revolution per second, objects even a meter away are already flying through the picture frame at 6.28 m/s. Even a 120fps camera won’t help here.

Size of the Ball in Pixels

OK, next question. How large is the ball at different distances? To answer that, we made a number of shots with the ball at different distances from the camera and measured the size of the ball in pixels. The results are the following:

Distance to ball (mm) 100 200 300 400 500 600 700 800 900 1000
Ball diameter in pixels (px) 190 105 70 55 45 37 33 29 26 22
Distance vs Pixel size
Distance vs Pixel size

This data can be described fairly well using the following equation (the reasons for this are a topic of a later post):

PixelSize = 23400/(21.5 + DistanceMm)

Two observations are in order here. Firstly, a ball at distance 5m will have a pixel size of about 4.65 pixels, which not too bad. Note that it would be bad, though, if we were to use a resolution of 320×240, as then it would be just 2 pixels. Add some blur or shadows and the ball becomes especially hard to detect. Secondly, and more importantly, if we decide to use such an equation to determine the distance to the ball from its pixel size, we have to expect fairly large errors for balls that are further away than a couple of meters.

So that’s it. Now we’ve got a feel of the camera and ready for actual image processing.

Parallelizing development

It takes time to design and manufacture the robot’s mechanical parts. It takes time to order, solder together and debug electronics. The third important component of robot development is the main soccer playing algorithm, i.e. the smartphone software, and it is important to start working on it as soon as possible. But how do you go about developing it while the chassis is not available? We used two tricks here.

The simulator

While making the real chassis is time-consuming, making a “virtual one” can take less than a day. So that’s what we actually started with on the very first day of the course – developing a simplistic software simulator, emulating a two-wheeled robot on a football field. The robot can “sense” the balls on the field as well as the opponent’s goal using a simulated camera. It can move around by setting the speed of its two wheels.

Telliskivi Simulator
Telliskivi Simulator

Such a simulated robot makes it possible to write a soccer-playing control algorithm, not unlike the one we shall be using on the real machine. The control algorithm can connect to this simulated robot via TCP/IP and send commands such as “wheels <a> <b>” (set wheel speeds). This is equivalent to the smartphone connecting to the chassis via Bluetooth and sending the same “wheels” command.

The simulator is written in Python, and the core of it is about 350 lines of code, including comments and doctests. The code is hosted on github here, and everyone is welcome to use it and contribute. It is fairly easy to add custom robot models to it, and in fact, on one of the practice sessions we managed to have a small simulated soccer match against Team Spirit. We lost, because of our attempts to introduce last-minute changes to our algorithm, which happened to break it completely. Thus, the simulator managed to simulate an actual emergency at a competition.

We would like to believe that our simulator was at least in part what motivated Team Spirit to take a go at simulator development themselves, and end up with a way more sophisticated piece of software with better physics engine (JBox2d), better visualization facilities and perhaps better modularity (albeit a more complicated architecture). It is also freely available from github, take a look.

The NXT prototype

Lego NXT prototype
Lego NXT prototype

The second trick that has been enormously helpful in the first steps of software development was to take a Lego NXT constructor kit, and hack up a two-wheeler with a simple camera mount. In fact, it was enough to take the “default” NXT robot, turn the position of the NXT brick around, and connect a couple of blocks on top of it for holding the phone.

The NXT brick can be controlled via Bluetooth. The protocol does not seem to be documented anywhere, but some peeks into the source code of the nxt-python package helped to decipher it. Thus way before the mechanics and electronics of the actual Telliskivi robot were ready, we got the opportunity to work on the smartphone code: establishing the Bluetooth connection, checking connection latency and getting the first attempts at ball detection from a camera mounted on an actual moving robot.

In fact, the N9-controlled NXT is a rather fun toy in itself!