I open-sourced and published today this java persistence library that I've been developing and using for some months now.
Although I started working on it after switching from hibernate to something lighter for a project where memory footprint was an important constraint, I started using it on most of my projects, and was happy enough with it to share it as open-source.
The key points of the library are
There's nothing disruptive in this library, but it gathers all the concepts I like, and excludes all the painful stuffs that degrade my productivity (XML, but I also stopped fighting with HQL joins).
Because I wanted to avoid writing a website, I used the maven "site" feature, and although I wish there were an out-of-the-box support for docbook or some other standard documentation format (rather than this apt thing), the feature is handy to use and flexible enough to let me customize the layout the way I wanted. The maven site:deploy is the killer feature, imho, it deploys the whole site to a remote server with a minimal amount of configuration work.
Anyway, here's the site: http://www.upfrost.org
I thought about VPS benchmarking when I was looking for a new server, and found out that Gandi was providing a lot of useful information in their benchmark page, and they used this UnixBench program to evaluate the processor-equivalence of their shares.
Having trust issues myself ^^ I ran the program on a bunch of VPS I manage:
For the details of the results, check the full table:
NOTES:
I was really surprised by the good performance of linode, but keep in mind that all those providers are excellent, and all provide a very different kind of service, while this benchmark only provides a score on CPU and Disks (not bandwidth, not additional services, not reliability, nor capacity).
I couldn't test all the possible configuration and flavours of these providers, but hopefully it gives a good picture of their respective performance.
The other day, I happened to use one of my classes that manages preferences (from a property file). Pretty common, I know. So common, that eclipse popped up the completion window with 11 (eleven) possible classes with that very same name. The class name's Node (because it creates a tree, and I severely lack imagination).
Wait. ELEVEN just in the path of a single project: I may not be the only one with imagination issues. How many Node classes are there in the world?
Pretty much impossible to know, but I figured out that checking the whole maven repository may give a rough idea. So I ftp'ed the whole maven2 central repository to grab the jars (that's for SCIENCE, guys!!), and made a script.
156, that's the number of classes that are called Node. But Node is also used as a morpheme in 4856 classes, out of the 340K in the whole maven repository. In other words, 1 out of 70 classes contains the morpheme Node. WOW. I'm feeling less lonely. Thanks, brothers!
So I extended the script to get the whole figures, and created the Top 100 Java Morphemes with it.
So 1 class out of 22 contains the morpheme Impl. Haha, so much for those java-bashers that complain we over-engineer with interfaces and abstract classes, this is factual proof that Java coders can also implement real classes, at least 4.43% of the time! Yay!
The same Top 100 in a Web Cloud:
Think it looks familiar? Of course, that's what most of your code looks like. Now let's check the classnames, are we doing better?
The Token morpheme may not be the most used morpheme to create a classname (only #86), but it's the most common classname. We sure like parsing.
My smartest readers will probably notice some salient data
Regarding *Util* classes, the chart comes out as expected: if you're intenting to create a StringUtil or FileUtil class, chances are there may be some out there that provide something you need, so don't miss out a chance to duplicate it and create your own!
Here's the script used to calculate those numbers. Note that extra care was taken to avoid counting duplicated classnames from the same project (fully-qualified classnames in different jars, or re-jarred, and avoiding duplicates inside the same project -- some developers create myproject.v1.SomeClass and myproject.v2.SomeClass which are just copypasta and don't qualify as two distinct classes).
#!/bin/bash
# Where the jars are located
JARFOLDER=ftp
if [[ "clean" == "$1" ]]
then
echo "Cleaning workfiles..."
rm -f jars.lst allclasses.lst sorted.lst
exit 0
fi
if [ ! -f jars.lst ]
then
echo "Creating jar list..."
find $JARFOLDER -type f | grep '\.jar$' >jars.lst
else
echo "Jar list already exists, skipping"
fi
CURRENT=1
TOTAL=`cat jars.lst | wc -l`
echo "found" $TOTAL "jar files"
#
# Extract the list of classes from the jars
# rm -f allclasses.lst
if [ ! -f allclasses.lst ]
then
cat jars.lst | while read LINE ; do
zipinfo -1 $LINE | grep -v '^META-INF/' | grep '.class$' | grep -v '^schema.system.' | grep -v '\$' | cut -d. -f1 | tr [/] [.] | sort | uniq >>allclasses.lst
echo -n -e "done " $CURRENT "/" $TOTAL \\r
CURRENT=$((CURRENT+1))
done
else
echo allclasses.lst already exists, skipping.
fi
#
# Each line contains "f.q.d ClassName" (3 first domain element + classname)
# then sorted and uniquified, then only the classname is output.
# This prevents classnames duplicated in the same project to appear
# several times, but let them as duplicate if they appear to be from different
# projects (which is the case if the 3 first elements of the full classname
# are distinct).
#
# For instance, we don't want project.v1.SomeClass and project.v2.SomeClass
# to be counted as 2, because they're from the same project and are likely
# to be copypasta.
#
if [ ! -f sorted.lst ]
then
echo "Sorting the classes"
cat <allclasses.lst | awk "BEGIN{FS=\".\";}{ if (length(\$NF)>2) {printf(\"%s.%s.%s %s\n\",\$1,\$2,\$3,\$NF);}}" | sort | uniq | cut -d' ' -f2 | sort >sorted.lst
fi
echo "found" `cat sorted.lst | wc -l` "distinct classes, now sorting..."
#
# Counts the occurence of each class name, and outputs a CSV line
# containing the short classname and its appearance count.
echo doing classnames...
cat sorted.lst | awk "BEGIN{FS=\".\"; count=1; last=\"\";}{ C=\$NF; if (last==C) { count++ } else { if (last != \"\") {printf(\"%s;%d\n\", last, count);} last=C; count=1; }}" | sort -r -n -t ';' -k 2 >classnames.csv
#
# Extract each word contained in the class name, sort and count, then
# output a CSV file similar to the one create above.
echo doing morphemes...
cat sorted.lst | awk 'BEGIN{FS="";}{for(i=1;i<=NF;i++){if ($i == toupper($i) && i>1) {printf("\n");} printf("%s",$i);} printf("\n");}' | \
sort | awk "BEGIN{FS=\".\"; count=1; last=\"\";}{ C=\$NF; if (last==C) { count++ } else { if (last != \"\" && length(last)>1) {printf(\"%s;%d\n\", last, count);} last=C; count=1; }}" | sort -r -n -t ';' -k 2 >morphemes.csv
#
# All of the above, but just for google :-)
echo doing google...
cat <allclasses.lst | grep '^com\.google\.' | awk "BEGIN{FS=\".\";}{ if (length(\$NF)>2) {printf(\"%s.%s.%s %s\n\",\$1,\$2,\$3,\$NF);}}"| sort | uniq | cut -d' ' -f2 | awk 'BEGIN{FS="";}{for(i=1;i<=NF;i++){if ($i == toupper($i) && i>1) {printf("\n");} printf("%s",$i);} printf("\n");}' | \
sort | awk "BEGIN{FS=\".\"; count=1; last=\"\";}{ C=\$NF; if (last==C) { count++ } else { if (last != \"\" && length(last)>1) {printf(\"%s;%d\n\", last, count);} last=C; count=1; }}" | sort -r -n -t ';' -k 2 >google-morphemes.csv
#
# Now some fun with common classes
echo doing base64...
cat sorted.lst | grep "Base64" | awk "BEGIN{FS=\".\"; count=1; last=\"\";}{ C=\$NF; if (last==C) { count++ } else { if (last != \"\") {printf(\"%s;%d\n\", last, count);} last=C; count=1; }}" | sort -r -n -t ';' -k 2 >base64-classnames.csv
echo doing String...
cat sorted.lst | grep "String" | awk "BEGIN{FS=\".\"; count=1; last=\"\";}{ C=\$NF; if (last==C) { count++ } else { if (last != \"\") {printf(\"%s;%d\n\", last, count);} last=C; count=1; }}" | sort -r -n -t ';' -k 2 >string-classnames.csv
echo doing Log...
cat sorted.lst | grep "Log" | awk "BEGIN{FS=\".\"; count=1; last=\"\";}{ C=\$NF; if (last==C) { count++ } else { if (last != \"\") {printf(\"%s;%d\n\", last, count);} last=C; count=1; }}" | sort -r -n -t ';' -k 2 >log-classnames.csv
The release is as early as the version implies, but here it is. Swit 0.9.0 is a small library providing various stuff for Wicket: a button generation engine, with a bunch of skins; a border generator for html/css, and a layout manager for liquid [1-3]-column layouts.
The fun part implementing it was the button generator, not unlike the various one available around the web, except that here it's primarily meant to be used in java apps to dynamically generate some graphics stuff that are seriously boring to draw manually with the gimp or photoshop.

Swit Homepage: http://swit.kornr.net
Hey, I hope Quentin Tarantino won't steal my title and make a movie out of it! Anyway, the Wicket's implementation of Ajax is so good that I just stopped using anything else to make my web client communicate with my server. There's this one thing however, that prevents the fine scaling of Wicket sites, that's namely memory consumption. Wicket by itself does not consume that much memory though, but if you want to use the sweet Ajax components, you're stuck with stateful pages. And using stateful pages + lots of visitors usually implies memory issues: it's not just your own data contained in the WebSession object that is stored in memory, but also the current page hosting the ajax components, and the n last pages visited. If your pages are big, the session are big, that's the point.
Add to that a specific need to make the session last longer than usual, to allow users to stop using the site, then going back after a long period of time, and still have their session available, and you'd rapidly be waiting your users with fear.
So, to conciliate long sessions, limited-memory servers, and stateful wicket, the best possible solution, besides asking your users not to tell their friends, is to apply a Jdbc Jutsu to the session management of your servlet container. At least, that's how I solve my issue.
Tomcat provides a JDBCStore for its PersistentManager class (which is responsible for managing the sessions): this java jutsu just saved my server's memory from going out of control. Unlike the default org.apache.catalina.session.StandardManager that stores everything in your precious heap memory, the PersistentManager provides a lot of flexibility regarding the session storage. For instance, a typical configuration would keep the sessions in-memory, but passivate them into a database after a few minutes of inactivity (instead of consuming all this good memory for idle or disconnected users until their session expires).
The Tomcat documentation lacks a few examples, so here's mine: I wanted unlimited sessions, and passivate idle sessions in the JDBC database after 120 seconds of inactivity (lines to customize are yellow).
<Host ...... <Context path="/MYPATH" docBase="MY-APPLICATION.WAR"> <Manager className="org.apache.catalina.session.PersistentManager" saveOnRestart="true" maxIdleSwap="120" minIdleSwap="-1" maxActiveSessions="-1" maxIdleBackup="-1"> <Store className="org.apache.catalina.session.JDBCStore" driverName="com.mysql.jdbc.Driver" connectionURL="jdbc:mysql://localhost/MYAPP" connectionName="DATABASE-USERNAME" connectionPassword="DATABASE-PASSWORD" sessionTable="tomcat_sessions" sessionIdCol="session_id" sessionValidCol="valid_session" sessionMaxInactiveCol="max_inactive" sessionLastAccessedCol="last_access" sessionAppCol="app_name" sessionDataCol="session_data" checkInterval="60" /> </Manager> </Context> </Host>
Then in the database and schema specified in the configuration above, just add the following table:
(from the Tomcat Manual at http://tomcat.apache.org/tomcat-6.0-doc/config/manager.html):
create table tomcat_sessions ( session_id varchar(100) not null primary key, valid_session char(1) not null, max_inactive int not null, last_access bigint not null, app_name varchar(255), session_data mediumblob, KEY kapp_name(app_name) );
Well, that's it, no more wicket-related memory issues.
Update after Thyzz comment (see below)
Wicket can actually use either its original HttpSessionStore (that stores everything in the servlet http session), or its new SecondLevelCacheSessionStore that stores the PageMap on disk. Here is below some metrics I made that compare the memory usage with and without the tomcat's JDBCStore:
To get those metrics, I used a web application that makes a moderate memory usage, and changed it so that the SessionStore can be either the HttpSessionStore or the SecondLevelCacheSessionStore. Additionnally, I added a new byte[1024*500] allocation into the session object, so that the memory consumption be artificially higher (for the purpose of testing in this specific configuration, applications with a really low memory footprint of their session behave totally differently, and are unlikely to have any memory-related issue). Then I ran a script on another computer that would make a loop over an http request for the frontpage (and all the resources of that page).
As a result, both HttpSessionStore and SecondLevelCacheSessionStore end up with an OutOfMemoryException. The memory usage is slightly better when using the SecondLevelCacheSessionStore, but I did not test the http latency; a real benchmark would require to compare both memory and speed, but at least the figures shows that the memory issue is prevented.
Recent comments