Randa Report: The Fall of KDateTime

Standard

The main goal for me and Volker for this year Randa Meeting was to port KCalCore, our calendaring library, away from KDateTime. KDateTime/KTimeZone are classes for handling date, time and time zones which were used back in the KDE4 (and earlier) days when Qt’s QDateTime wasn’t good enough for us, mainly due to missing time zone support.

In Qt5 QDateTime finally gained all the necessary features we need which made it possible for us to leave KDateTime behind. But we couldn’t just go and replace “K” with “Q” everywhere – there are many subtle (and not-so-subtle) differences in API, behavior and some missing features of QDateTime that required us to carefully consider each step. The fact that we started working on this over 2 years ago shows how challenging task this was.

We did not expect to finish the port here, but once we dived into it, things went fairly well and I’m glad to say that after four days of intensive hacking, surrounded by Swiss Alps and mountains of chocolate, we succeeded. KCalCore is now free of KDateTime and KTimeZone and that in turn made (after some minor adjustments) the rest of KDE PIM free of KDELibs4Support. That’s a huge milestone for us.

Many thanks to John Layt for laying the initial ground for the porting and to Mario and Christian for the steady stream of chocolate and sweets to soothe our nerves :-)

If you want to help us to continue improving Kontact and other parts of KDE, please donate to the Randa fundraiser so that we can continue organizing similar productive sprints in the future. Thank you!

Akonadi – still alive and rocking

Standard

It’s been a while since I wrote anything about Akonadi but that does not mean I was slacking all the time ;) The KDE PIM team has ported PIM to KDE Frameworks 5 and Qt 5 and released the first KF5-based version in August 2015 and even before that we already did some major changes under the hood that were not possible in the KDE4 version due to API and ABI freezes of kdepimlibs. The KF5-based version of Akonadi libraries (and all the other KDE PIM libraries for that matter) have no guarantees of stable API yet, so we can bend and twist the libraries to our needs to improve stability and performance. Here’s an overview of what has happened (mostly in Akonadi) since we started porting to KDE Frameworks 5. It is slightly more technical than I originally intended to, sorry about that.

Human-readable formats are overrated

As you probably know Akonadi has two parts: the Server (that manages the data and resources) and client libraries (that applications use to access the data managed by the server). The libraries need to talk to the Server somehow. In KDE4 we were using a text-based protocol very similar to IMAP (it started as RFC-compliant IMAP implementation, but over the time we diverged a bit). The problem with text-based protocol and large amount of data is that serializing everything into string representation and then deserializing it again on the other end is not very effective. The performance penalty is negligible when talking to IMAP servers over network because the network latency hides it. It however shows when everything is happening locally. So we switched from a text-based protocol to a binary protocol. That means we take the actual representation of the data in the memory (bit by bit) and write it to the socket. The other side then just takes the binary data and directly interprets them as values (numbers or strings or whatever). This means we spent almost zero time on serialization and we are able to transmit large chunks of data between the server and the applications very, very efficiently.

Let’s abuse the new cool stuff we have for things we did not originally designed it for

The communication between clients and server needs to work in two directions. It’s not just the clients sending requests to server (and server sending back replice), we also need a mechanism for the server to notify the clients that something has changed (new event in a calendar, email marked as read, etc.). For this we were using DBus signals. The clients could connect to a DBus signal provided by the Akonadi Server and when something changed, the server notified the clients via the signal. However during initial synchronization or during intensive mail checks the amount of the messages sent over DBus by Akonadi was just too high. We were clogging the DBus daemon and the transmission of messages via DBus is not cheap either. But hey, we have an awesome and super-fast binary protocol, why not use that? And so we switched from using DBus for change notifications to sending those notifications through the same mechanism that we use for all other communication with the server. In the future it will also allow us tu even more customize the content of the notification thus further improving performance.

Pfff, who needs database indexes?

We do! Once we switched to the binary protocol we found that we are no longer waiting for the data from database to be serialized and sent over to client, but that we are waiting for the database itself! I sat down and look at EXPLAIN ANALYZE results of our biggest queries. Turns out we were doing some unnecessary JOINs (usually to get data that we already had in in-memory cache inside the Server) that we could get rid of. SQL planners and optimizers are extremely efficient nowadays, but JOINing large tables still takes time, so getting rid of those JOINs made the queries up to twice faster.

One unexpected issue I found was that the database was spending large amount of time on “ORDER BY … DESC” on our main table (yes, we query results in descending order – this way we can show the newest (= usually most relevant) emails in KMail first, while still retrieving the rest). PostgreSQL users will be happy to know that adding a special descending index sped up the queries massively. MySQL users are out of luck – although MySQL allows to create a descending index on a column, it does not really do anything about it.

Splitting libraries is too mainstream, we merge stuff!

One of the things that I’ve been looking forward to for a very long time was making the Akonadi server a private part (an implementation detail) of the Akonadi client libraries. In KDE4 versions we had to maintain a backwards compatibility of the Akonadi protocol (the text-based one I mentioned earlier) as it was considered a part of public API. This was extremely limiting and annoying for me as a maintainer, as it was making fixing certain bugs and adding new features unnecessarily hard. Historically Akonadi server was a standalone project and it was expected that 3rd party developers would write their own client libraries in their own toolkits/languages. Unfortunately that never happened and the KDE Akonadi client libraries were the only client libraries out there that were actively developed and used (there were some proof-of-concept GLib/Gtk client libraries, but never used seriously).

So, since KDE Applications 15.08 the Akonadi server has no public API and writing custom client libraries is not officially supported. The only official way to talk to the server is through the KDE Akonadi client libraries, which is now the only public API for Akonadi. This may sound like a bad decision, like closing ourselves down from the world, like if we don’t care about anyone else but KDE and Qt. And it’s sort of true – we were waiting for almost a decade for someone else to start writing their client libraries, but nobody did. So why bother? On the other hand the only actual and real user of Akonadi – KDE – benefits much more now – for example the binary protocol is optimized so that (de)serializing Qt types (like QString or QDateTime) is very efficient because we can use the format that Qt uses internally. If we were to be “toolkit agnostic”, we would have to waste time on converting the data to some more standard representation and nobody would win.

To finally get to the point: today I took the Akonadi client libraries (that lived in kdepimlibs.git) and merged them to akonadi.git repository, where the Akonadi server is – at least locally on my machine, still need to fix some build issues before actually pushing this, but I expect to do it tomorrow. In other words the entire Akonadi Framework now lives in a single self-contained git repository. This brings even more benefits, mostly from maintainer point of view. For instance we can now share more code between the server and the libraries that we previously had to duplicate or expose via some private shared library.

The kdepimlibs.git will still contain some libraries for now that we yet have to figure out what to do with, but I guess that eventually kdepimlibs.git will meet the same fate as kdelibs.git – being locked down and preserved only for historical reference.

The Cheese Dependency

In September last year the KDE PIM team also met in Randa in Swiss Alps. I was totally going to blog about it, but then other things got into way and I kept delaying it further and further until now. So with an awkward 5 months delay: huge thanks and hugs to the entire Randa meetings staff and one more hug to Mario just for being Mario. In Randa we met to discuss where KDE PIM should go next and how to get there. After several days on intensive talking we outlined the path into future – you probably read about it already in some of the blogs about AkonadiNext from Christian and Aaron, so I won’t go much into that.

To list some of the visible and practical results – we now use Phabricator to coordinate the work in the PIM team and to better communicate what is happening and who’s working on what. There’s a nice backlog of tasks waiting to be done, so if you want to help us make PIM better feel free to pick up some task and get to work! Furthermore we looked into cleaning up some of the old code and optimizing some critical code-paths – basically a continuation of an effort that started already during Akademy in A Coruña. Some of the changes were already implemented, some are still pending.

Lord of the PIM: The Return of The KJots

One of the major complaints we heard about the new KF5-based KDE PIM was the disappearance of KJots, our note-taking app. Earlier last year, on a PIM sprint in Toulouse, we decided that we need to reduce the size of the code base to keep it maintainable given the current manpower (or rather lack thereof). KJots was one of the projects we decided to kill. What we did not realize back then was that we will effectively prevent people from accessing their notes, since we don’t have any other app for that! I apologize for that to all our users, and to restore the balance in the Force I decided to bring KJots back. Not as a part of the main KDE PIM suite but as a standalone app. I have yet to make a first release that packagers can package, but it already builds and is reasonably usable. I’m not planning on developing the application very actively – I’ll keep it breathing, but that’s about it. That’s all I can afford. If there’s anyone who would be interesting in maintaining the application and developing it further (it’s a rather small and simple application), feel free to step up! When the app reaches certain quality level, we can start thinking about merging it back to KDE PIM.

Is that all?

Yes. No. Well, it’s all for today. There is much much more happening in KDE PIM – Laurent did tons of work on of refactoring and splitting the monolithic kdepim.git repository into smaller, better reusable pieces and now seems to be messing around Akregator, and Sandro is actively working on refactoring the email rendering code and calendaring. But I’ll leave it up to them to report on their work :) And of course there’s much more planned for the future (as always), but this blog post already got a bit out of hand, I’ll report on the rest maybe next time I “accidentally” have an energy drink at 11 PM.

And as always: we need help. Like, lots of it. KDE PIM might look huge and scary and hard to work on, but in fact it’s all rainbows and unicorns. Hacking on PIM is fun (and we are fun too sometimes!), so if you like KDE (PIM) and would like to help us, let’s talk!

Qt containers and C++11 range-based loops

Standard

Much has been written on teh interwebs about performance of iterations over Qt containers with Q_FOREACH vs. std iterators vs. Java iterators. However there is very little about how the new C++11 range-based loops work with Qt containers (or maybe I just suck at Googling, but well here I am…). Today I found out that there is a little catch that one has to be very careful about when using range-based loops with Qt containers.

Qt containers are all implicitly shared classes, which means that copying them is very cheap since only shallow copy occurs. When a shared copy is modified (or rather when a non-const method is called) it calls detach() and performs the expensive deep copy. When there are no other copies of the object (i.e. when the reference count is 1) no copying happens when detach() is called. We say that such instance is “not shared”.

To get to the point – the code below performs equally fast in both cases:

QStringList list{ "1", "2", "3", .... };
Q_FOREACH (const QString &v, list) {
   ...
}

for (const QString &v : list) {
  ...
}

However in the following code range-based loop will perform much worse than Q_FOREACH.

class MyClass
{
public:
   ...
   QStringList getList() const { return mList; }
   ...
private:
   QStringList mList;
};

...

Q_FOREACH (const QString &v, myObject.getList()) {
   ...
}

for (const QString &v : myObject.getList()) {
  ...
}

The difference between the first example and this one is that the QStringList in this example is shared, i.e. reference count of it’s data is higher than 1. In this particular case one reference is held by myObject and one reference is held by the copy returned from the getList() method. That means that calling any non-const method on the list will call detach() and perform a deep copy of the list. And that is exactly what is happening in the range-based loop (but not in the Q_FOREACH loop) and that’s why the range-based loop is way slower than Q_FOREACH in this particular case. The example above could be even simpler, but this way it highlights the important fact that returning a copy from a method means that the copy is shared and has negative side-effects when used with range-based loops. Note that if the method would return a const reference to QStringList, everything would be OK (because const …).

The reason for the speed difference is one peculiarity of Qt containers: they have a const overload of begin() which does not call detach(). Q_FOREACH internally makes a const copy of the list, so the const overload of begin() gets called instead of the non-const one.

On the other hand the range-based loop does not take any copy and simply uses the non-const version of begin(). As we explained above, calling non-const methods on shared Qt containers performs a deep copy. Only exception is when the container itself is const because then the const version of begin() is called and the code will behave the same as Q_FOREACH.

Ironically with stdlib containers (std::vector for example) the situation is exactly the opposite. std iterators are not shared classes so making a copy of an std container always performs a deep copy, but calling a non-const method does not trigger any copying. That means that Q_FOREACH, which always takes a copy of the container would be doing a deep copy in such case while range-based loop, which only calls begin() and end() would not be triggering any copying. Although std containers provide cbegin() and cend() methods to get const interators, there’s no need for the range-based loop to use them, since begin() and end() will always perform equally well on std containers.

To prove my point, here is the benchmark code I used. It’s an extended version of an older benchmark of Qt containers.

#include <QStringList>
#include <QObject>
#include <QMetaType>

#include <qtest.h>
#include <cassert>


enum IterationType
{
    Foreach,
    RangeLoop,
    Std,
    StdConst
};

Q_DECLARE_METATYPE(IterationType)


class IterationBenchmark : public QObject
{
    Q_OBJECT

private Q_SLOTS:
    void stringlist_data()
    {
        QTest::addColumn<QStringList>("list");
        QTest::addColumn<IterationType>("iterationType");
        QTest::addColumn<bool>("shared");

        const int size = 10e6;

        QStringList list;
        list.reserve(size);
        for (int i = 0; i < size; ++i) {
            list << QString::number(i);
        }

        QTest::newRow("Foreach") << list << Foreach << false;
        QTest::newRow("Foreach (shared)") << list << Foreach << true;
        QTest::newRow("Range loop") << list << RangeLoop << false;
        QTest::newRow("Range loop (shared)") << list << RangeLoop << true;
        QTest::newRow("Std") << list << Std << false;
        QTest::newRow("Std (shared)") << list << Std << true;
        QTest::newRow("Std Const") << list << StdConst << false;
        QTest::newRow("Std Const (shared)") << list << StdConst << true;
    }

    void stringlist()
    {
        QFETCH(QStringList, list);
        QFETCH(IterationType, iterationType);
        QFETCH(bool, shared);

        if (!shared) {
            // Force detach
            list.push_back(QString());
            list.pop_back();
        }

        int dummy = 0;

        switch (iterationType) {
        case Foreach:
            QBENCHMARK {
                Q_FOREACH(const QString &v, list) {
                    dummy += v.size();
                }
            }
            break;

        case RangeLoop:
            QBENCHMARK {
                for (const QString &v : list) {
                    dummy += v.size();
                }
            }
            break;

        case Std:
            QBENCHMARK {
                QStringList::iterator iter = list.begin();
                const QStringList::iterator end = list.end();
                for (; iter != end; ++iter) {
                    dummy += (*iter).size();
                }
            }
            break;

        case StdConst:
            QBENCHMARK {
                QStringList::const_iterator = list.cbegin();
                const QStringList::const_iterator = list.cend();
                for (; iter != end; ++iter) {
                    dummy += (*iter).size();
                }
            }
            break;
        }

        assert(dummy);
    }
};

QTEST_MAIN(IterationBenchmark)

#include "iterationbenchmark.moc"

$ moc iterationbenchmark.cpp > iterationbenchmark.moc
$ g++ iterationbenchmark.cpp `pkg-config --cflags --libs Qt5Core` `pkg-config --cflags --libs Qt5Test` --std=c++11 -fPIC -O3 --o iterationbenchmark
$ ./iterationbenchmark
********* Start testing of IterationBenchmark *********
Config: Using QtTest library 5.4.2, Qt 5.4.2 (x86_64-little_endian-lp64 shared (dynamic) release build; by GCC 5.1.1 20150422 (Red Hat 5.1.1-1))
PASS   : IterationBenchmark::initTestCase()
PASS   : IterationBenchmark::stringlist(Foreach)
RESULT : IterationBenchmark::stringlist():"Foreach":
     48 msecs per iteration (total: 96, iterations: 2)
PASS   : IterationBenchmark::stringlist(Foreach (shared))
RESULT : IterationBenchmark::stringlist():"Foreach (shared)":
     48 msecs per iteration (total: 96, iterations: 2)
PASS   : IterationBenchmark::stringlist(Range loop)
RESULT : IterationBenchmark::stringlist():"Range loop":
     53.5 msecs per iteration (total: 107, iterations: 2)
PASS   : IterationBenchmark::stringlist(Range loop (shared))
RESULT : IterationBenchmark::stringlist():"Range loop (shared)":
     177 msecs per iteration (total: 177, iterations: 1)
PASS   : IterationBenchmark::stringlist(Std)
RESULT : IterationBenchmark::stringlist():"Std":
     51 msecs per iteration (total: 51, iterations: 1)
PASS   : IterationBenchmark::stringlist(Std (shared))
RESULT : IterationBenchmark::stringlist():"Std (shared)":
     179 msecs per iteration (total: 179, iterations: 1)
PASS   : IterationBenchmark::stringlist(Std Const)
RESULT : IterationBenchmark::stringlist():"Std Const":
     53 msecs per iteration (total: 53, iterations: 1)
PASS   : IterationBenchmark::stringlist(Std Const (shared))
RESULT : IterationBenchmark::stringlist():"Std Const (shared)":
     52 msecs per iteration (total: 52, iterations: 1)
PASS   : IterationBenchmark::cleanupTestCase()
Totals: 10 passed, 0 failed, 0 skipped, 0 blacklisted
********* Finished testing of IterationBenchmark *********

Both Q_FOREACH cases are equally fast because as we explained above, Qt always uses the const iterators and no deep copying happens. Range-based loop with non-shared list performs equally well, because even though it calls detach(), there are no copies to detach from and so no deep copy occurs. However range-based loop with a shared list is over 3 times slower, because detach() here will actually perform a deep copy. The same happens with for loop with non-const std iterators, which is basically just expanded version of range-based loops (range-based loops are just a syntactic sugar for for loops with non-const std iterators). For loops with const std iterators perform equally well as Q_FOREACH, because that is what Q_FOREACH does internally.

To sum this up, when using range-based loops with Qt containers:
Make sure the container is const …

// shared, but const, forces call to QStringList::begin() const,
// which does not call detach()
const QStringList list = objectOfClassA.getList();
...
for (const QString &v : list) {
   ...
}

… or make sure the container is not shared.

// shared and non-const
QStringList list = objectOfClassA.getList();
// call to non-const method causes detach() and deep copy,
// 'list' is now non-shared
list.append(QLatin1String("some more data"));
...
// calls non-const begin(), but detach() of non-shared
// containers does not perform deep copy
for (const QString &v : list) {
   ...
}

Note that this just moves the slow deep-copying outside of the loop, but the deep copy still occurs. The point is that you need to be careful not to create a new copy of the ‘list’ after it has been detached on line 5, but before passing it to the loop on line 9. Failing to do so would make the list shared again and the loop would trigger yet another deep copy.

I was very excited when range-based loops were added in C++0x and I’ve been using them in some new C++11 code I wrote since then. But in Qt-based code I’ll be reverting back to the much safer Q_FOREACH. While it is possible to have range-based loops as fast as Q_FOREACH as we’ve shown above, one has to be really careful and constantly think about whether the container is non-shared or at least const and use Q_FOREACH if not. For that reason using Q_FOREACH everywhere is much safer for now.

I know that this is not any ground-breaking revelation and many of you probably even know of it, but I hope that it will still be useful for people who are not aware of the implementation details of Q_FOREACH and range-based loops, or just like me did not realize the importance of difference between shared and non-shared container instance.