Batfish, Just a Bunch of Functions
I have been programming in Ruby for a bit less then a year but already, I accumulated a number of data structures and algorithms. Since they could probably be of some use to someone else and I don’t want to lose everything because of a failure of some sort, I decided to publish them on my github. The name of the gem: batfish (the name comes from a random haiku generator). So far, only my implementations of BK-tree and trie are included, but more should follow soon as I get more time to package them. For more informations, you can browse the batfish documentation.
Trie
A trie is a data structure that is used to store an associative array where the array’s keys are strings. It has the same structure as any other tree, except that keys are not stored in nodes. Instead, each edge has a character associated with it and you browse the trie by going down the edges, one character at a time, until you reach the end of the key.
The node you reach this way contains the value associated with the key.Tries have several advantages over binary search trees. First, the complexity of trie lookup is O(L) where L is the length of the key while it is of O(n) where n is the number of elements in the tree for a BST. It also takes less space since different keys overlap. It also have advantages over hash tables. First, the keys, in a trie, are ordered, which makes it a useful data structure to use to store a dictionary. It can also lead to faster lookup depending on the hash function and considering that collisions are possible with string hashes.
More informations on tries can be found here.
BK-Tree
A BK-tree is a useful data structure for nearest neighbor lookup in discrete metric spaces. A metric space is any space that obeys the following rules, where d(a,b) is the distance between a and b.The later is also known as the triangle inequality. It basically states that there is no shorter way to go from a point to another than the direct way. Examples of discrete metric spaces, that is, where the distances are integers, are the real numbers or the levenshtein distance between strings.
The BK-Tree is constructed by measuring the distance between the value to insert and every node, going down the edges corresponding to the distance at each node. Once an unregistered distance at a node is calculated for that node, the value is attached to it. The lookup process works by going down each edges in the distance to the node ± the lookup treshold range until a node with a distance equal to the treshold value or less is found. It is thus possible to find all the nodes within a certain distance of a value without going through each nodes. However, the larger the threshold, the more nodes you have to visit.
For more information on BK-trees, you can read the following article.
Wikinomics or Business 2.0
Collaboration takes an ever growing place in the business world. Not only an Open Source phenomenom, collaborative development and production of goods is becoming a new reality. The new means of communications and the ever growing knowledge industry reshape the way businesses interact with each others in the digital age. This opens the door to new opportunities that could, if successfuly harnessed, lead to huge wealth for new businesses and old ones that adapt. On the other hand, those who resist these changes may end up hitting a wall.
In their book Wikinomics: How Mass Collaboration Changes Everything, Don Tapscott and Anthony D. Williams describe a new business model, Business 2.0, that takes advantage of the new realities of the digital age. Their theory sits on four main principles: being open, peering, sharing and acting globally. They then use them to describe seven business strategies to harness the power of mass collaboration.The Four Principles
The four principles are at the core of the Wikinomics theory. These ideas contrast with the principles of previous business models which promote a closed vision of business. Conventional wisdom says companies should protect their intellectual property fiercely, they should keep research and development in-house and keep control over every steps of the development cycle of their products. Wikinomics is all about enterprises opening themselves to the world and as such, stands on different principles.
The first is being open, but openness has a lot of different meanings. Openness could mean openness in research. With science and technology evolving faster than ever before, it is not possible for a single company to do everything in-house. If it wants to stay competitive, it has to open its boundaries and harness ideas from the outside. Openness could also means open standards. This is something that is common in the software world where standards like XML make it easier for companies to develop compatible products. It could also mean transparency. Disclosure of pertinent information to interested parties like shareholders or partners can only lead to improvements. This is also true with costumers, where more information can lead to better feedback and creation of new ideas via product customization.
The idea of peering severely contrast with the common idea that corporation should be highly hierarchized to be functional. Peer production community have been around for a while already. You can find them everywhere in the Open Source movement and on social networks. However, as our society evolves, this idea makes more and more sense in a business environment. Peering succeeds where hierarchical management fails by leveraging self-organization. By harnessing ideas and opinions from a wider group of people, companies open themselves to more opportunities.
Conventional wisdom says company should control and protect their assets as much as they can. However, sharing may end up being more profitable. Sharing crucial R&D information can prove profitable if it leads to the discovery of the missing piece of the puzzle. This leads to win-win interactions between companies, such as pharmaceutical companies that win from having a better knowledge of the human genome by giving away some of their discoveries in this field in return of other companies discoveries. This could also lead to the licensing of assets that are unused, but that can prove useful to someone else. This can also be extended to the field or computer science where distributed systems can share the computing load between users in order to reduce the total overload on a company’s servers.
The last principle, but not the least, is acting globally. Globalization is not new, it has been there since the end of the Second World War with the introduction of the GATT. However, the phenomenom have accelerated lately with collaboration over long distance made possible by the widespread usage of new means of communications such as the Internet. This allows company to tap into a bigger pool of human and physical resources. However, companies still tend to act locally in multiple places by opening branches that perform the same work in parallel. Acting globally means more than that. It means breaking down geographical boundaries and act as a trully global company by having global workforces that work together, by putting forward global processes that minimize redundancies between different locations and usually, by having a global IT platform that allows for easy collaboration and sharing between distant locations seamlessly.
Seven Business Strategies to Harness Mass Collaboration
Theory is always fun, but knowing how to apply it is usually better. Economics being more of an applied science than a pure one, every business models needs its share of practical strategies and processes. Here is a brief overview of seven business strategies described in details by Don Tapscott and Anthony D. Williams in their book to help enterprises harness the opportunities offered by mass collaboration.
“Peer Pioneers” or how thousands of dispersed volunteers can harness the power of peering and sharing to create fast, fluid, and innovative projects that outperform those of the largest and best-financed enterprises. Peer producers apply Open Source principles to create products usually made of bits. Good examples of this are the Linux community or the Apache project and how companies like IBM collaborate with them.
“Ideagoras” or how marketplaces for ideas, inventions, and uniquely qualified minds enable big companies such as Procter & Gamble to tap global pools of highly skilled talent more than ten times the size of their own workforce. These Ideagoras give companies access to a global marketplace that they can use to extend their problem-solving capacity. Good examples of this are InnoCentive and yet2.com.
“Prosumers” or the increasingly dynamic world of customer innovation, where a new generation of producer consumers considers the “right to hack” its birthright. These communities working around a common product can be an incredible source of innovation if companies give customers the tools they need to participate in value creation. A good example of this is LEGO and their LEGO Factory initiative.
“New Alexandrians” or how a new science of sharing will rapidly accelerate human health, turn the tide on environmental damage, advance human culture, develop breakthrough technologies, and even discover the universe all the while helping companies grow wealth for their shareholders. The New Alexandrians, scientists, researchers and enterprises from all around the world, are ushering in a new model of collaborative science that will lower the cost and accelerate the pace of technological progress in their industries. A good example of this is Intel and its industry-university collaboration program.
“Platform for Participation” or how smart companies are opening up their products and technology infrastructures to create an open stage where large communities of partners can create value, and in many cases, create new businesses. These platforms create a global stage where large communities of partners can create value and, in many cases, new businesses in a highly synergistic ecosystem. Good examples of this are Twitter and its open API or Facebook and the Facebook applications platform.
“Global Plant Floor” or how even manufacturing-intensive industries are giving rise to planetary ecosystems for designing and building physical goods, marking a new phase in the evolution of mass collaboration. The plant floors harness the power of human capital across borders and organizational boundaries to design and assemble physical things. A good example of this is Boeing and the development process of its 787 Dreamliner.
“Wiki Workplace” or how mass collaboration is taking root in the workplace and creating a new corporate meritocracy that is sweeping away the hierarchical silos in its path and connecting internal teams to a wealth of external networks. These new workplaces increase innovation and improve morale by cutting across organizational hierarchies in all kinds of unorthodox ways. A good example of this is the Geek Squad program.
Wrapping It Up
Overall, Wikinomics is a pretty good book for everyone interested in the world of business. It poses a new look on subject that evolves fast and it brings new solutions to problems that used to look impossible to solve.
DHTwitter: A Vision of a Distributed Twitter
First of all, I have to say that this post is about an idea, not code. Now that this is said, here is the idea. As I said in a previous post, Twitter is facing a problem that could lead to its extinction. More and more people use third-party clients to access Twitter, thus decentralizing the system while keeping the load on the servers. Less people using the Twitter interface means less possibility of monetization and could ultimately lead to the company going bankrupt. The obvious solution to this problem would be to simply restrict the access to the service by limiting the API. However, since closing itself to the world is never a good thing to do, something else has to be done. This is were the idea of DHTwitter comes into play.
The Classic Way
First, lets look at Twitter as it is right now. Yes, Twitter is a social application and as such, each of the network’s nodes, each user, is connected to a certain number of other nodes. The apparent structure of the service is thus the one of a peer-to-peer network or a decentralized network. However, from a technology point of view, this is not the case. Twitter is a fully centralized network.

To do whatever action you want to do, you have to go through the company’s servers. Every single request, whether it is through the Twitter public API or directly on the website, has to be processed by Twitter. All the clients do is parse the server’s response, all the hard work is done on Twitter’s side.

Why is this bad? Because Twitter’s users check often for new messages since they post a lot of messages. Also, Twitter’s open API led to the creation of many applications based on the service, the worst for the company being clients and automated services. First, a lot of clients, to stay up to date, will make frequent requests to the servers to see if something new was posted. This wouldn’t be too bad, for example, with a blog service where people only post once in a while, because the servers could use a cache system to reduce unnecessary processing. However, with its thousand of new tweets per minute (and growing), a caching system is hardly useful for Twitter. The second problem is automated applications such as Wordpress plugins or blog update services. These service also make a lot of requests, every time a blog page is loaded in the case of Wordpress and at a certain rate for the update services. Again, each request sent by the service has to be processed on Twitter’s servers, thus requiring a lot of processing power without the user even seeing the little Twitter bird.
The Efficient Way
So how can we solve this problem? Well, since Twitter is, by its social nature, decentralized, why not try to decentralize the system? This is what peer-to-peer applications such as BitTorrent and Skype have been doing for a while, and it seems to work well. Starting with the second generation of peer-to-peer networks, decentralization became a key idea. This culminated with the integration of Distributed Hash Tables (DHT) in peer-to-peer protocols and clients. From Wikipedia:
Distributed hash tables (DHTs) are a class of decentralized distributed systems that provide a lookup service similar to a hash table: (key, value) pairs are stored in the DHT, and any participating node can efficiently retrieve the value associated with a given key. Responsibility for maintaining the mapping from keys to values is distributed among the nodes, in such a way that a change in the set of participants causes a minimal amount of disruption.
So how does this apply to Twitter? Well, with its minimalist structure, Twitter is the perfect candidate for a DHT system. DHTs use key value pairs for data management and Twitter is just that, a key value system. The key is your username, and the value your tweets. What makes this an even better solution, is the simplicity of the service. Compared with Facebook activity or blog posts, tweets doesn’t include complex data such as images or text formatting. It is all about text, and small pieces of text. This not only makes storing user’s data easy, but it also requires less space. A simple XML file of an average size could be use to store a user’s full history of tweets and a list of all of his followers/following.
The idea of DHTwitter is to create an additional layer of abstraction, this time between the Twitter API and the client or application used by the end user. This API would include the overlay network and the keyspace partitioning specifications needed to establish the DHT. Once this is done, the API would redirect requests to the Peer Cloud to see if the information is available before hitting Twitter’s servers. If the user wants to publish a tweet, then all the API has to do is to publish it in the Peer Cloud as well as on Twitter’s servers to be accessible from everyone, even people not using DHTwitter. By sending the information part by part and by sending the latest tweets first, transfer speed would be less of a problem. Also, keeping a user’s friends close to him in the overlay network would make request processing in the Peer Cloud faster.

This would seriously reduce the load on Twitter’s server by distributing the lookup and parsing processing throughout the Peer Cloud. It would reduce the number of server hits by connecting the users together, in the same way they connect in the social layer.

Possible Implementation
For now, this is only an idea. I am not planning on coding an implementation on my own in the near future. However, if someone is interested, please let me know, I would love to participate in the development. The biggest challenge in implementing DHTwitter is the lack of libraries for DHTs. Full system specifications exist; Kademlia, Pastry and Tapestry(Chimera) are good examples. However, there is a lack of solid implementation. Librairies are practically nonexistent and most practical implementations are hard coded into existing applications, such as Vuze or LimeWire. Again, if you thought of a way this could be implemented, please let me know.
Privacy Piracy
Yesterday, a friend of mine received a letter from Videotron, his Internet service provider, telling him that he should stop downloading copyrighted content. He immediately called me in panic, not because he is afraid of Videotron’s menaces (he is in law school), but because he is worried that someone is observing what he is doing on the Internet. I have to say that I would also be worried if I received a letter confirming me that my ISP is spying on me. So like I said, he called me asking me for ways to protect his private life from piracy from so called anti-piracy groups. I thus decided to make a list of programs I know of that can help you protect your personal data from your ISP and/or external observers.
Tor
Tor is free software that helps you protect your privacy online. It is an implementation of onion routing that allows you to browse the Internet anonymously. It is a decentralized network, meaning that anybody can create an anonymous server and add it to the network. To make a long story short, onion routing systems work by encrypting packets multiple time using different keys from different server in the network. The packet is then sent to the first server who can decrypt the first layer to find who to send the packet to next. The following servers then do the same thing until the packet reaches its final destination. This way, each server is only aware of the server that comes just before and after it. It is thus impossible for a single server to tell who the packet comes from and what is its final destination. From an ISP point of view, the packet just looks like it is going to a random server and its encrypted content looks like gibberish. Tor is probably the most popular anonymity network and it is the one I was using when I started college. A good front-end for the system is the cross-platform controller Vidalia.JAP/JonDonym
JAP, the Java Anon Proxy, is similar in design to Tor. It also uses multiple layers of encryption to anonymize the users. The main difference is that JAP servers are not anonymous. When you use the service, you can choose between different Mix Cascades of server. This allows you to choose who you trust and who you don’t. This is a major difference with Tor, with which you don’t know who owns the servers and if they are keeping logs. However, this has a downside. Since everybody knows who owns the servers, this leaves them vulnerable to attacks from hackers or the government. There is also a known backdoor implemented in the software following an intervention by the German Federal Criminal Police Office. It is still a good service though, and one that I used for my last year in college after the IT service found how to block Tor.
I2P
I2P stands for the Invisible Internet Project. It is also a system that keeps its users anonymity using multiple layers of encryption. It is a decentralized network, like Tor. However, it encrypts the data from the beginning to the ends. It even keeps the sender and the destination secret to the middle servers. It can easily be integrated into different software to anonymize their activity over the Internet. Supported protocols include regular TCP/IP communications through I2PTunnel, a simple tunneling application to browse the web, BitTorrent, eDonkey, Gnutella and more. However, it is still considered as beta so there may be some bugs left.Peer Guardian/MoBlock
Peer Guardian, for Windows and Mac OS X, and MoBlock, for Linux, are open source IP filtering programs. These programs allows you to create a list of IP addresses that you don’t want your computer to connect to. You can easily find extensive lists for anti-piracy groups on the Internet. These lists includes IP addresses from the MPAA, MediaDefender and other groups which use peer-to-peer network to find people to sue for copyright infringement. This way you can make sure you share only with real peers and not people who wants to spy on you.VPN
If the solutions above are not enough, or if you don’t want to sacrifice speed by going through an anonymous network, then subscribing to a VPN may be a good alternative for you. A VPN, a virtual private network, is similar to a home network except that it sits on the Internet instead of being local. By connecting to the network, all outside requests you make through the network uses the network’s IP address. This means that it is impossible to tell who the original requests come from, only which VPN it comes from. Also, from an ISP point of view, all the requests look like they are directed to the same address and the content of the packets is usually encrypted. An example of a VPN provider is IPREDator, a service offered by the founders of The Pirate Bay. For an extensive list, your can google for VPN providers.
Most file sharing protocols also allow encrypted transfers. Just look through your client’s options. Even though the encryption is pretty weak, it can mislead your ISP filtering system into thinking it is regular traffic. Another useful tool to help you navigate anonymously is FoxyProxy. It is a small Firefox addon that eases the transition from one service to the other. This way you can switch from normal browsing to anonymous browsing in a single click.
So don’t be a dummy and protect your privacy!
Interface Lift And An Awesome Wordpress Plugins List
As you probably have already noticed, depending on your sense of observation, I changed my blog’s design again today. The main reason for that change is that I wasn’t satisfied with the header and sidebar of the past theme. So here it is, a new theme, minimal again, but effective. There are absolutely no images in the whole design except for the RSS icon. I kept the footer unchanged since I like its look and feel and I couldn’t find anything to ameliorate.
But since I am not a professional designer and I can’t write a full post about the characteristic of the layout I chose or the typography concepts in use, I will write about the Wordpress plugins I use in the theme. As you may already know, this blog is driven by Wordpress and plugins play a big part in this CMS. However, I am not a huge plugin user, but there are still 10 that I consider essential to a good blog.
Askimet
The first in my list is the only one that actually comes with a fresh Wordpress install even if it is desactivated by default. Askimet is one of the most powerful and useful plugin. It is a spam filter that becomes especially handy when your blog starts showing up in search results. With approximately 99% of the comments posted on my blog being spam, I can’t live without this one.All in One SEO Pack
All in One SEO Pack, as its name says, is a plugin that faciliate the search engine optimization of your blog. Using a simple administration interface, you can change the title of your pages with simple rules and it automatically adds various information to your pages headers like keywords or descriptions.Google Analyticator
I first installed this one in a moment of lazyness. I didn’t want to manually integrate the Google Analytics code into my footer template in case I change theme in the future. However, it is when I visited the settings page that I realised all the power the plugin offers. Not only does it allow you to embed the analytics code automatically in your blog, but it also offers a variety of advanced options to tweak the service to your needs.Google XML Sitemaps
Google XML Sitemaps is a complement to All in One SEO Pack. Don’t be tricked by its name, it doesn’t only generate and submit your blog’s sitemap to Google; It also does so for Yahoo!, Live Search and Ask.com. This is a really good plugin if you want your blog to come up in search results. It also leaves a lot of place open for the user to customize the creation and the submission processes of the sitemap.ShareThis
ShareThis is an essential for social bloggers. Like AddThis and other similar services, it makes it easy to embbed links to delicious, digg, Twitter, Facebook and other social services to your posts. The difference between ShareThis and these other services is that it does it better. The way the plugin is designed lets you position it wherever you want and change its appearance to match with your website.Simple Tags
Simple Tags is a ninja plugin. In no way your user will ever notice you are using it, but that doesn’t mean it is not useful. It allows you to manage your tag really easily in the administration interface. You can mass tag as well as auto tag your posts using different auto tag services. On the front-end side, it offers more functions to display your tags the way you want, whether it is in a list or in a cloud.SyntaxHighlighter Evolved
This one is a must for coders. SyntaxHighlighter Evolved is the most complete code highlighting plugin I found. It supports a wide array of languages and offers different color schemes to match your design. It also displays your code in a concise and effective way. If you write code often, you need this one.Twitter for Wordpress
For the Twitter fans out there, this plugins allows you to integrate the twitterfeed of any public profile. It lets you choose what information to grab, what to display and how to display it. Twitter being pretty minimal, the plugin is also pretty minimal, but it does what it should do and it does it well.Wordpress Automatic Upgrade
Another plugin for the lazy bloggers, Wordpress Automatic Upgrade does what it says it does, it automatically upgrades your wordpress install when a new version comes out. It does so step by step, asking for user input between each step to make sure nothing goes wrong. It also gives you links to backups of your database and system in case something breaks during the update process.WordPress Related Posts
The final plugin in my top ten list is Wordpress Related Posts. This one is a plus for your reader and yourself at the same time. It displays links to posts similar to the one currently displayed by analysing the tags and categories of each posts. This hopefully keeps visitors for a longer time on your blog, giving them more to read.So this is it. A new interface and a new awesome list of wordpress plugin. I hope you enjoy your new reading environment and never forget: stay KISS!
Posts