Thursday, December 22, 2011

STIR November Visualized in Treemap

The following picture shows the individual reach in November 2011 of Dutch internet labels rendered as a treemap (data source: STIR)

The size of the squares indicates relative 13+ reach.


The next picture show the unique reach of label publishers:

Wednesday, December 21, 2011

The Paradox that is The Great Idea

David Vismans, Chief Technology Officer at Hyves (@dvismans)

Everybody has a great internet idea; most people actually think it is so good that they have to keep it to themselves. And then, when they see someone else has built a successful company on top of that idea they think "see, that could have been me".

The thing is, it could very likely not have been you.

Why?

There is something weird about great ideas, when you have one, you think you are half way there. The rest is a matter of just doing it.

Illustrative to this point is that in todays fast moving internet technology industry everybody wants to do a startup; everywhere we see these hip startup weekends drawing large groups of people who cannot resist the dog-whistle that is The Startup. The idea is there! Now let’s do a startup to make it happen!

I am of the opinion that the idea is worth very little, and that is the reason that many startups fail or even never get of the ground. The only thing that matters in getting an idea of the ground is execution.

And that's why it's so hard to get a good a idea to life – execution is very hard.

Luckily the list of things you need for successful execution of pretty much any internet type idea is enumerable. Based on the experience I have at two major Dutch product development companies (Hyves and TomTom) I have created this list.

But first, what makes a good idea fundamentally in my opinion:

Entropy

What?

The value of internet type products is in the new information they create. The idea here is that value comes from new information, and there is a (latent) need for this information.

We see many examples where there is no new information created: someone combines a couple of APIs and mashes up a website that allows you to see for instance "where the girls at". In this example they use the Foursquare API to locate where users that have their sex set to female are together in groups. This is of course funny, but there is no significant new information created - in information theory terms you can say that this product is very compressible. Which usually means that there is someone out there that can build this idea faster than you, because it's not hard.

And that's a thing to remember, if something is hard, you are probably doing something right, you are creating entropy, new information, that has value for someone out there, and will be hard to copy giving you an inherent edge.

Now given that your idea has the potential to create value, what do you need for execution on this great idea?

1. Team

The team creates the product, they are the product and they will therefore directly determine the success of the product.

The team will transform an abstract vague idea into perfect information (machine executable code) and along the way make thousands of small decisions that affect the final shape of the product, they need to be able to make those decisions.

There is no trick here, you cannot get away with people who do not believe in the idea - they will make the wrong decisions - or who people who are not good enough at what they do - they will make the right decisions, but implement them wrong.

Hire only those that meet these requirements, and do not hire those that you have even the slightest doubt about. Think of it like this: every time you hire someone who does not match the above criteria, someone is secretly replacing parts of the engine of your brand new car with used parts. Your car will still run initially, but it’s quality is inherently being degraded and it will come to a stop in a much shorter time than you expect, and worse, you will not know when to expect it.

For developing an internet type product, if you are lucky you can do it alone, you have to then unite two disciplines: the discipline of Product Management and the discipline of Software and System Engineering. If you can't do that yourself you will need others. And if you want to be fast in bringing your product to the market you will need to scale your execution, so you will need more people than just you.

2. Iterative Product Development

Creating something new requires strong discipline in forcing yourself to be critical, and not sometimes, but all the time. And the team needs to have this approach too. You need to very regularly play the devil's advocate on your own product and be ruthless on it's qualities and value it creates. The problem here is however that you are in love with it, and if all is well so is the team. Therefore you need to expose your product to people who are in love with other things, and not your product. They will be truly critical.

You need to build this feedback loop in from the start, from the idea phase, all the way through to the future versions of the product. There are well know mechanisms to ensure this happens, most of them captured under the over-hyped term "agile product development". This category of methods builds the feedback loop in. It forces you to, as soon as you can, release a minimal shippable product - to get feedback, make changes, and as such evolve your product based on actual feedback. It will make you fail fast as well, but that is what you want, not to waste time.

Everybody in the team needs to understand this philosophy down to its fundamentals, if they do you will have very little waste in you team, meaning that you are not going to be building stuff that has no value.

3. Bootstrapping

If you got the two former points covered, you may now have an idea that creates value, a great team, and a process for iterating to the nearest optimal first version of your product with minimal waste.

Now comes the problem of bootstrapping. How do you get people to use your product - even if it has been brilliantly implemented it is still not creating any value if no one is using it.

Bootstrapping is the process of getting sufficient momentum on the use of your product so that it delivers the value anticipated and you can evolve it further. And this is costly stuff, in most cases it requires you to invest in getting eyeballs on your product, and combined with the viral mechanisms that are available nowadays your product may get traction, but this can be extremely costly. You need to figure out how you are going to bootstrap and gain sufficient momentum well before you start implementation, it may turn out that you cannot afford it. Its then like building a great car in a building with no garage door.

--

That's it, only three things you need, but you may now appreciate why it is so hard to realize that idea you have, and why execution is key.

Getting a good team together is very hard, the market for Product Managers and Engineers is very tight and if they are good these people can work anywhere.

In addition most people have a very traditional waterfall like idea of product development in their genes. The iterative approach is relatively young and only since a couple of years people are starting to truly understand it. Getting people to work in this pattern is very hard if they do not understand the fundamentals behind it, and have never experienced the benefits.

Bootstrapping can be very expensive, and will require a lot of effort if you have no platform that already attracts a lot of traffic.

What about Hyves?

Careful: shameless plug follows!

There are few companies in the Netherlands who get this all right - considering that I feel Hyves is for anyone who is interested in online product development an extremely interesting environment. We've got all three points covered.

Hyves has one of the best engineering teams in the Netherlands, meaning that for anyone who is interested in working in the Product Management discipline the first item on the shopping list can be checked right away.

Hyves more or less adopted Scrum as an iterative product development method two years ago, and since a couple of months we are busy evolving the implementation of this method to the next level. We are not there yet, but with respect to releasing early and often we are there, this is engrained in the minds of the whole team. We release products as often as on a weekly basis, and make changes based on user and focus group feedback.

With millions of users Hyves has got the bootstrapping part covered as well, like very few others we can expose new product to a mass of users, and create value for them in a very short time.


Hyves has the ambition to grow into the best online product development company in the Netherlands. We are constantly building and evolving the team and methods we use. And note that the products we will be developing will not be limited only to the Hyves network, or even to the Netherlands. If this environment and way of working appeals to you, and you are an Engineer or a Product Manager, do not hesitate to contact me.

Emerson: "In every work of genius we recognize our own rejected thoughts, they come back to us with a certain alienated majesty."

Wednesday, October 12, 2011

HipHop for PHP at Hyves

HipHop is Facebook’s open source C++ compiler for PHP. HipHop (also known as HPHP) will compile your PHP code to highly optimized C++ code, which you can then compile (with g++) into a big binary that will run your web site (it includes a web server). If you want to know more about Hiphop and how it was created, you can check out the Facebook Engineering Blog article or the Github project.

After its introduction many bloggers have written background articles regarding Hiphop but few seem to have actually used it in a production environment. At Hyves, we use HipHop to run our web servers and our daemons, which are also written in in PHP. In this blog, I will detail some of our experiences and results.

Compiler and interpreter

Hiphop includes both and interpreter and a compiler. This is something that is not frequently mentioned in blog posts on Hiphop. The interpreter (called HPHPi) can be used for developing, thus saving you the hassle and the delays associated with compiling every time you change a single file. In general is has the same behavior as the compiled version of Hiphop. It is a bit slower than using PHP with an opcode cache, but it is not that bad and with a fast laptop it is still workable.

The compiler can be used to create an optimized binary for your code base. The Hyves code base is in the order of 3.5M LOC in PHP. The conversion from PHP to C++ has to be done by a single server, but we use a dedicated cluster of sixteen servers for compiling the C++ source code to an executable binary. This takes about seven minutes including the building of the web site (e.g. minimizing css and Javascript files).

The resulting binary is approximately 500MB size. This cannot be deployed to our web servers in a serial fashion, it would take too long. To deploy this to our web servers, we use a bittorent based p2p deploy system. In case of an emergency, we can roll out a fix in approximately ten minutes to our 300 web servers.

Behavioral differences

The HPHP interpreter, the compiled version and the official PHP binary all have slightly different behavior. HipHop is about 99% compatible with pure PHP. 99% times 3.5M LOC is a lot of problems. If you want to run HipHop, make sure that your unit tests are ran against all three versions automatically or you will drive yourself and your team members insane, especially in a mixed PHP / HPHP environment.

Some PHP 5.3 stuff is still not completely supported in Hiphop, such as namespaces, of which at the moment nearly every feature is broken. Extensions such as PHP’s SOAP extension have been ported but some behavior differences exist between the variants.

Converting the Hyves code base to run on HipHop took a big effort, several of our software engineers worked on this projects for months.

Hiphop has some really cool features. There is a call_user_func_async() function (it does exactly what it says), and you can have two versions of you web site running at the same time, so that you can deploy a new version without downtime. Also, you can catch fatal (E_FATAL) errors with their stack traces with HPHP, a feature sorely lacking from PHP.

There is only sparse online documentation of these specific features so you will have to figure a lot of it our by looking at the source.

Performance

Performance of our Hiphop enabled web site has been very good. In particular our API has strongly benefited, with API calls being twice times as fast and requiring only a quarter of the cpu power compared to plain PHP. This has resulted in much faster mobile clients and better performance of the parts of our web site that use internal API calls.

In the top graph you can see our web servers 95% and 99% response times. We enabled HipHop for all our web servers in March 2011. The higher response times just before the switch are due to internal network congestions because of some issues with the p2p deploy system we use.
The bottom graph shows the CPU load of our web servers, which has dropped by a factor of four after the switch to HipHop. This will enabled us do deprecate a big portion of our web servers in the future.

Open source

HipHop is an open source project, available on Github. If you file a bug, Facebook’s maintainers can be pretty responsive, well reported bugs can be fixed in a matter of days. Of course you can always fix it yourself on Github and try to get the fix pulled into the project.

Al though Facebook has reported that Wordpress and Mediawiki are experimenting with HipHop, not many people actually seem to be using Hiphop in production. For now, the community is pretty small outside of Facebook.

Also Hyves contributed to the HipHop project, we contributed a number of bugfixes and the ability to use distcc so you can compile your web site on multiple machines.

Conclusion

All in all the switch to HipHop has been a big success for Hyves. Al though switching comes with a steep cost in terms of implementation and tooling, for a big web site such as Hyves it is definitely worth the effort.

Wednesday, September 7, 2011

Meaningless numbers and what unit test are not

Introduction
The field of automated testing is a fascinating one with many facets to it. There are perhaps few other areas in the software development discipline (with the exception of software development methodologies, if I may) that teem with so many fallacies and myths.

In this series of blog posts I am going to share my experience of what works when it comes to building automated testing solutions for complex software products based on what I saw work when testing an advanced IDE for model-driven development, online retail banking solutions, and the largest social network in the NL — Hyves.


Meaningless numbers and what unit test are not
Remember those days in the beginning of the last decade when it suddenly became fashionable to create automated tests? The days when the agile manifesto [1] was drawn up, when software engineers started talking of test-driven development (TDD), and Kent Beck’s JUnit became ubiquitous. It was in those days that the whole development organization of the company I worked at received an email that read something like this:
Dear all,

I just took a look at how we are doing with our automated test coverage and see that we have gone up to 54 automated unit tests. Well done! However this is not enough, we should have more. Please keep up the good work.

Kind Regards,
firstName lastName

Out of curiosity, one colleague decided to take a look at the last unit tests contributed by the email’s author. What he found looked similar to this piece of code:
import junit.framework.TestCase;

import com….InterfaceA
import com….ClassBImplementingInterfaceA

public class ModelCheckerTest extends TestCase {

public void testAParticularChecker() throws Exception {
ClassBImplementingInterfaceA checker = new ClassBImplementingInterfaceA();
TestCase.assertTrue(checker instanceof InterfaceA);
}

}
I still wonder how another hundred of comparable unit-tests could ever have benefited the real users of the product…

Of course, successful software engineers with real-world experience had by then long been building useful automated unit tests (aka component tests) [2], certainly long before JUnit was conceived of and the term TDD coined [3]. Nevertheless, I hope this story gives a taste of excesses brought forth by the hype around automated testing back then.
Sometime after, my team was adding support for deploying applications (generated by our IDE) to a popular enterprise application server. The documentation of the server boasted the following passage:
[The application server’s] final release passes our internal testsuite, the testsuite for the [the application server’s] web service stack and the >2200 web service tests that come with Sun's Compatibility Test Suite (CTS).
Wow, with more than two thousand automated tests and two testsuites into the bargain (most of which were unit tests), this must be a server you can safely deploy your application to and rest assured it would deploy without errors and then run like a train. Well, the brutal facts of reality were a bit more bitter — some standards conforming applications would simply not deploy to this server or would work incorrectly [4].

What could be wrong here? Nothing—only the perception that thousands of automated unit tests can give you any guarantee that the software product you want to use or purchase is usable for you.

Unit tests do have their uses and can bring a lot of value to your software development organization; it’s just that they are more a means of making it easier for your development team to reliably and cost-effectively evolve and maintain its software product. They are simply not intended to directly benefit ultimate users of your product.

What’s next

In the next post I’ll take a look at what unit tests actually are and what value they can bring. Stay tuned.

References
[1] The agile manifesto
[2] Unit testing
[3] The Mythical Man-Month: Essays on Software Engineering by Frederick P. Brooks (Jan 1975)
[4] Are Vendors Becoming More in Charge of Java Enterprise Edition... or is Sun losing control over Java EE?