Hwo to find information

What I do to let information find me

I get frequently asked about all kinds of programming problems, new technologies or simply some good ideas for a thesis or prototype. And I've noticed that colleagues or students often go out on information hunting expeditions. Now I am a heavy Google user as well but switched to newsletters and portals for ongoing topics. Information should find me automatically. I will list my sources of information (with comments) for the benefit of others. At the end I'd like to speculate on automatic information gathering methods (goodle API, autonomous agents, products like Autonomy).

Information Sources

I use mostly seven kinds of sources: (You will find links and references in the section below)

weblogs (blogs)

newsletters

newsgroups

mailing lists

list archives

portals

directories

search engines

conference papers

I do not use RSS feeds yet but this might change shortly (I found a good introductory article with examples in one of my newsletters). I might even integrate some of my sources into my site via RSS.

Weblogs are a fairly new communication platform with very interesting social features. The way they work are investigated by collecting statistics on linking and citing and how new messages are dissipated, whether small clusters of cross-bloggers show up etc. A lot of blogs are just ego dumps but some are very interesting because they are written by real visionaries. Look below at the one on social software e.g. Other blogs which I like are Lisa Rein's OnLisaReinsRadar and Meg Hourihans blog at Megnut . Another very good blog is run by my friend Andreas Kapp on Concentrator where you will find ideas on the future of the internet. Andreas is one of the few persons who really understand what it means to "being digital".

One of my favourites is Gunter Dueck, the former math prof. and now Data Mining Guru at IBM. He thinks against mainstream nonsense. Take a look at his blog at the Omnisophie Site.

A newsletter is a short textual representation of new pages on a site. In very concentrated form a newsletter informs you about new stuff and one click brings you to the new article. A newsletter creator needs to create useful abstracts of the new content which are short enough to let the reader quickly browse through them but long enough to give a faithful description of the full source. A newsletter is an extremely useful way to inform readers.

Newsgroups used to be more important in the past. They have been replaced by mailing lists, in many cases because of spam problems. But when I started working on a new topic (e.g. when I had to port a framework from OS/2 to NT) I first went through the proper comp.sys.... newsgroups to learn about existing problems and get some good hints. Sometimes there are more than 2000 postings available. I usually started with the last 1000-2000 messages and went through them quickly. Now this may sound like a big waste of time. Fact is: not knowing about a well-known problem can cause you much larger delays and is very frustrating. I know that some colleagues still favor the trial and error approach but working with alpha/beta releases of huge software packages like application servers or operating systems made me realize that I just need the collective experience made with those beasts.

Once you have gone through the past postings you are pretty much current on a technology and its problems. You should always read the FAQ of those groups first before posting something. But don't be too shy to post something. Chances are that you will find help. But do use a special e-mail address for posting to avoid spam getting to your main address. If your company blocks newsgroups at the firewall use e.g. google to read the newsgroups (look at the "groups" link in google)

Mailing lists are the centralized version of newsgroups. The better ones are usually moderated and spam is blocked. Examples are the xml-dev mailing list. Traffic can be very high on those lists. Still, if you start with a new technology (e.g. eclipse IDE), register for the associated mailing lists to learn about the latest bugs and problems or to meet interesting participants.

List archives are good for high-traffic mailing lists which are only sometimes relevant for your work. E.g. I receive the xml-dev mailings directly but for XML Schema or XSL questions I go to the respective archive and read the postings there. Otherwise my inbox would just overflow in only a couple of days. Something in between the complete mailing list and archives are digests: basically an extract of the mailing list with sender and subject information. Good for high-volume lists. I found it hard to read through digests because many subjects to not really reveal a lot about the content. One must know the senders a bit to handle digests.

Portals are the most important sites with respect to a certain topic. A good example is TheServerSide.com which covers J2EE related things. Other examples are openp2p for peer-to-peer things, eclipse.org for the eclipse IDE etc. I noticed that I use some portals heavily for a certain time and then not for a much longer time. Still, you should know the important portals for your areas of work (I guess I should ask for those in tests (;-))

Directories are hierarchically organized informations, collected by humans. I use the the open directory project (DMOZ) sometimes - usually way too late after spending too much time with search engines. You will find a lot of information directly relevant to your topic doing a directory search. Directories are an excellent way to get a quick overview within a certain field of work.

So why use search engines if there is DMOZ? As I said, I use google too much and DMOZ not enough. Nut search engines allow you to find only partially relevant informations as well and sometimes they retrieve things which a category based search in a directory would not have brought together: Search engines are an association tool as well and help you getting new ideas or finding new associations. This process is by necessity vague and time consuming but also exciting. Just don't get lost too much (The dictionary effect where you look for Z, stumble over B, D, G and at the end you've even forgotten that you where looking for Z)

I just said something about search engines above. I use google mostly but there are others as well like kartoo.com - an associative engine. I just bought the book on 100 google hacks to optimize and possible automate some of my searches.

I am quite sceptical about most conferences. Most of them seem to be an expensive way to waste your time. But I do like the OOPSLA conferences even though I've never been there. If I need to get into a new area of development quickly I usually try to find last years OOPSLA workshop on this topic. My latest example is the workshop on Model-Driven Architecture where I downloaded about 15 short papers which gave me a fast intro into the current state of affairs.

A list of information sources

Clay Shirky

Clay has reached cult status by now and this is abolutely justified. I stumbled over his writings on peer-to-peer software and was astonished how he was able to express the core qualities of this new development in a few words. I liked his writings on small world effects as well as his latest articles on social software and group developments. Lots of good ideas for thesis works to be found.

Corante Tech News

A bit like the wired newsletter corante covers the top tech news and hosts a number of important blogs, e.g. about Social Software

Java-Channel newsletter

The following is a short example from this newsletter:

                       In this issue

1. Developing E-Business Interactions with JAXM 
2. Creating Richer Hyperlinks with JSP Custom Tags 
3. Match4J 
4. Struts Studio 
5. JarJar ClassLoader 
6. JAIMBot 
7. JMemProf 
8. Mobicq 
9. Java XTools 
10. Poesia 
11. JAX-RPC on the Sun ONE Web Services Platform Developer Edition,
Part...
12. Savannah: Project Info - gnu.inet.nntp 
13. Open-source Java Flash Remoting 
14. Simply Singleton 
15. BuddhaIq 
16. JServices 
17. JAIM 
18. Visual Information Broker Enterprise 
19. The Socket API in JXTA 2.0 
20. NeoView 


Member reviews on other resources:
1. Usefull! (4/5)
2. TRMI (5/5)

There is an abstract on every item below, together with a link to a site or downloadable piece of java. To me this newsletter is a source of inspiration for new project ideas, improvements in the development process etc. A mixture of product information and source code examples.

the morning paper

Everyday a new paper discussed. Try to get in the mode of reading them daily. Get a "papers please" group going at your faculty.

infoq

Excellent information by the makers of theserverside. Videos and presentations from qcon and others. Features special themes like architecture, scalability, agile etc..

Todd Hoff's portal on scalability and ultra-large sites

If you are interested in reliability, scalability, performance, new algorithms and availability, this is the site to go.

IBM Developerworks newsletter(s)

Excellent information on new technologies like Java, XML, Linux, Application Servers etc. Most articels are 3-5 pages long and contain source code as well. You should register also for the newsletter on products if you work e.g. with IBM Websphere. Developerworks is turning into a first class repository. Don't miss their tutorials e.g. on Linux programming (LPI course materials) etc. Interestingly the developerworks articles are written in accordance to a DTD defined for this purpose and pdf versions as well as html are generated automatically. I'd love to have this implemented at our university.

OReilly Network newsletter on Java

There are more such newsletters from Oreilly and you should register for those technologies which you use regularly. I like e.g. Open Peer-to-Peer The onjava newsletter is similiar to the developerworks one with respect to the mix of theory and source code examples.

Javaworld

Again a whole family of newsletters covering java core, networking, xml etc. Register at least for the core newsletter to learn about new language features (e.g. templating, garbage collection). If you are working with Java you should register for several newsletters on Java just to stay current.

TheServerSide: J2EE Portal and newsletter

If you work on the J2EE platform, register for this newsletter. TheServerSide is the main portal for everything related to J2EE. Excellent articles, forum and last but not least free books on EJB/J2EE design patterns. I don't need another site for J2EE stuff.

IBM Redbooks and newsletter

Free and excellent books on various technologies (application server, message oriented middleware, clusters and grids, LDAP, TCP and security. Most books are a mix of IBM product information and generic technology and are therefore valuable also for non-IBM users. You can buy some of those books in bookstores as well.

Free books on enterprise IT topics, more...?

Enterprise security infrastructure and directories (LDAP), application server architecture, administration and performance. Clustering, caching and enterprise integration using web services and service bus technology. If this is your world - read on.

XML-Developer mailing list

Being an SGML/XML dinosaur this list has been quite important to me. Unfortunately it is also a high volume list which sometimes discusses rather obscure topics as well. But most everybody important in the XML area reads the list. You might want to join the mailing list or just register for the digest (which I found not so much useful. Another alternative is to only read the archive at XML dev archive

If you do XSLT or XSchema work, register with the respective mailing lists. You can post your problems there and chances are VERY high that your problems can be solved. Again, all the major specialists in these areas participate.

Alphaworks site

Another top source for exciting prototype technologies ready for download. IBM's research is available from this site for testing purposes.

IBM System Journal

For in depth technical information on almost everything related to computer science I found the System Journal a very good source. I used it e.g. for my lecture on building web based e-business applications or the Java virtual machine implementations.

News and resources on Interactive Publishing (el.pub)

If you are interested in publishing you should register for this newsletter. It covers almost everything in publishing including digital channels and as a special feature: european research efforts (so called framework programs). Excellent.

Edge.org

Nobel-price laureats discussing the the important stuff. With book recommendations. High value content with time between updates.

Jeff Sutherland's object technology site

Jeff is somebody who sees things long before others do. He pointed me to google, kartoo and also to such important books like Clayton Christensens "Innovators Dilemma" or books on Scrum, extreme programming etc. A low traffic high value newsletter. Jeff also participates in OOPSLA workshops. The only thing I hate about it is that it is hosted via yahoo-groups. Use a different account to register for those spam-distributing sources.

WIRED Magazin's newsletter

Call me nostalgic but this is where I get most of my social and political news from. Excellent links to privacy related laws, new scientific discoveries and in general how to live in our wired and unwired world. Short and precise. Other people also recommend TheRegister for technical and everyday news.

Bruce Schneiers Security newsletter

Definitely something to read if you want to cut through the hype of security announcements. First class security info. Frequently I extract something from this newsletter for my lecture on advanced internet security. Thanks to Morgan Kinney from Comparitech for pointing me to the correct link. They run an interesting InfoSec Blog as well.

CERT newsletter with security alerts

This is my early warning system for new virus threats, overflow attacks etc. A source for serious information on new security problems even it is not as current as it could be - for political reasons e.g. to give companies a chance to fix things first. I forgot something: I also receive Securityfocus newsletter once a month.

David Endlers quality site on security

David wrote some of the very best papers I ever read on security. Examples are wireless security, brute-forcing session ids or cross-site scripting. Low traffic newsletter available as well.

A production line for downloaded information

What are you going to do with all the information? I'd like to show you my production line for downloaded information. As I am a frequent traveller just printing out books, articles, papers etc. from the Internet does not work for me. I cannot read through large stacks of paper on the train without losing some pages and getting lost. Here is what I do: I bought a fast lexmark optra S1850 laser printer through ebay and a duplex unit for it as well. It runs as a network printer in our network and wasn't expensive. Cartridges are good for 17600 pages and can be bought cheaply in the aftermarket. Duplex printing is essential because of the reduced weight.

And I bought a ring binding system (a puncher) large enough to bind books with 500 sheets of paper. Those are available from Ibico, Renz and others and cost about 250 euro/dollar. Supplies for it (plastic ring binders, cover papers and transparents are available for little money on ebay. I use white plastic ring binders because I usually write the book title with a permanent marker on the back. A big advantage of plastic ring binders over all glue based techniques is that you can change/add to the content at any time - something I do frequently when I find new information about a specific topic. Some of my "books" start with only one article and then grow over a term. You can switch to a larger size binder any time.

As you can see the paperless office has not quite reached me - I just hate reading online. Not the least because I usually have a text marker and pen to highlight text or write down my comments (e.g. "slide" to denote that I will use this picture in one of my lecture slides).

Automated information finding

As I said above I do not use automated information finding yet, except for simple searches via search engines. Ebay offers a nice automated feature through stored searches. Once you've defined a useful search on ebay you can store it and ebay will run the search when new messages are posted. If your pattern matches you will get a message from ebay telling you which new articles fit to your query. I used this a couple of times when I was looking for parts for my yamaha FJ1100.

RSS will be an important topic for me and others in the future. I'd like to offer a RSS service for my site and also integrate other sites with mine through RSS. Sounds like a thesis to me (;-). Another thesis idea would be autonomous agents using the web-service based google api to harvest information.

Autonomy is a product that uses Bayesian networks and other technologies to filter relevant information according to your personal profile - which it creates as well through usage analysis. I've once seen a demonstration of it but couldn't do anything in a project yet.

Last but not least there is also the battle between the markup people (semantic network, topic maps) which believe in users defining the meta-data for their documents and the statistics or collaboration based people (google, autonomy, amazon) which believe in automated ways to create the meta-data from existing documents. This is of course a rather hot debate. Personal experience makes me believe that in most areas the statistical or collaborative filtering approaches will prevail - simply because tagging your documents with meta-data is a rather laborous process. This does of course not question the value of topic maps at all. The real question is about the costs to create those maps. Another thesis?