All Things Techie With Huge, Unstructured, Intuitive Leaps

SQL Update Needed

We as humans, are manufacturing and storing data at hyper-exponential rates. Most of it gets tucked into a database somewhere and then is available for retrieval using an SQL or structured query language call.

SQL was first created or defined by IBM in the 1970's. There have been enhancements through the years, but the middleware (it is called middleware because it represents the link between the database and the user) hasn't evolved to meet the realities of the modern internet paradigm.

We are storing data, meta-data, graphics, music, and all sorts of digital stuff like we never have before. And what do we do with it? We plunk it into a database and for the most part, there it sits.

Companies have come to realize that static stored data can be monetized and contribute to the bottom line. So they purchase all of these add-on data mining and data warehouse software packages to slice and dice their data. And what do they use? SQL that hasn't changed very much from when it was conceived.

What we need now, is a serious SQL upgrade. We need functions like SELECT REGRESSION(ColumnA, ColumnB) that will help us analyze the data. We need stuff like SELECT WEIGHTED_MOVING_AVERAGE(ColumnA, 100 values). We to know if a table insert is an outrider or a fat tail. We need to know R-SQUARED of column values. We need to be able to read the contents of videos in BLOB columns. None of this has been done.

We still labor on with primitive functions in the middleware with rich client or heavy back end stored procedures to compensate. It is time that someone looked at making intelligent middleware. We need to change it from Structured Query Language to Superb Questioning Language.


Interesting Problem Requires Algorithm

As a data geek, I get a lot of questions directed at me looking for real-life solutions to data problems. Consider this one that came in my email last night.

A social service runs a drop-in center for people. They provide services free of charge. They cannot take the people's names, because if the people thought that it was not a confidential service or that they were being identified in any fashion, they would stop coming.

They run these services sessions once a week. What they are allowed to record is only whether the person is a newcomer or has attended before. People try to attend weekly, but they come some weeks and skip other weeks. Some drop out forever, and some just for a little while.

The data held by the social service consists of just the following -- 1) the week 2) the number of people who have attended previously and 3) how many newcomers there are each week. They have five years of data.

Now, the government subsidizes this social service, and the government has asked a simple question that they cannot answer. The question is "How many unique individuals have you seen this year?".

It's a fairly complex problem. They don't know if at the start of the data, that it is the start of the records keeping coinciding with the start of the program. If it was, then it would be an easy exercise. They would just take the starting number of attendees and add all of the newcomers on a weekly basis. But this isn't the case.

With the starting number that they have, they don't know if the breakdown includes newcomers as well previous attendees. And if the starting number has previous attendees, how many are there. What they don't know is if the pool of starting individuals is complete, or if there are some individuals missing from the pool because they were drop-outs when the counting started. In other words, there could be any number of individuals who have attended once before the counting started, and then randomly show up throughout the year.

So the question is "How many unique individuals have been provided services since the counting started?" Government grant money rides on having an accurate answer.

How would you solve this? Comments are open.

(I'll put up my solution later)

Line Formation Elasticity -- RugbyMetrics

A lot of objective information is falling out from the results of my Software tool called RugbyMetrics. While doing extensive statistical data mining on actual professional rugby games in the Aviva Premiership, an incredible statistic fell out of the exercise, and that was line formation elasticity.

Rugby is a game where the defense lines up across the field to defend against a similar line of offence. When a player carrying the egg finds that his forward progress is blocked, he passes left or right down the line to his team mates. If there is a hole in the line somewhere on either the defense or offence, then there is a problem.

So I decided to measure line elasticity -- how quickly the line forms or reforms after it is distorted from a play. This analysis fell out of another analysis where I did a ratio of jersey counts between attackers and defenders at the time of tackle, which had a very interesting result.

What the line elasticity measure showed, was that the more efficient that the line was at reforming, the more successful the play (both in offence and defense). This is especially evident when the team with possession grinds away for a long time with very little field gained. The opposing defensive line is very elastic at reforming and very efficient.

What frame-by-frame video also showed, was the laggards who were late at assuming their position, thus leaving holes in the line. It was very interesting.

From there, when we saw that we could identify the defensive laggards, we saw that we could assign a numeric co-efficient of line efficiency, both at a team level, and at a player level.

From there, it was a short step to rating the roster of a team, and let the results settle into a hierarchy of the best players. There are many developed measures of a players worth coming out of RugbyMetrics. The thought struck me, that if a player is negotiating a raise in his contract, one of the bargaining chips could be a RugbyMetrics analysis to show that he is in the company of the best of the breed in the Premiership. Conversely, a team could use RugbyMetrics to prove that a player asking for a raise tends more to a journeyman than a star.

Its all fascinating stuff, and is opened by the doors of data mining and performance analysis.

The Future of Online Games

After watching "The Greatest Movie Ever Sold", I have come to the conclusion that everything is and can be a made into a vehicle for advertising. The movie is a movie about making a movie about trying to make a movie about the whole movie being one big advertisement and product placement exercise. It aims at total transparency at the crass world of advertising in movies using as a vehicle, a movie all about product placement in it, replete with commercials.

While watching it, I was struck by the relatively large amounts of money spent on advertising. And then, I was struck with the vision of online browser games of the not-so-distant future.

Playing games for virtual points is so lame. With foursquare.com making life a game, why not have real life prizes for virtual games. This is the way that I see it evolving:

Let's say Pepsi wants to take advertising with online games to a new level. The first thing that they do, is introduce a new product called "Pepsi Shooter". It is an inexpensively packaged slurp of Pepsi, the size of a shooter glass. They are distributed to stores like 7-11, but you can't buy them. You have to go online, and play a game called Pepsi Shooter. When you reach a certain level, the server sends you a bar code that you print out, and the next time that you are at 7-11, you get to collect your Pepsi Shooter prize.

Collect six shooters, and you get a full sized Pepsi. Collect 6 full-sized Pepsi by further game play, and you get a visor or a cap. You can see how this works.

The server keeps track of the players. They must log in and authenticate, and that way you collect the emails, phone numbers and names of the target demographic. From then on, you can keep marketing to them until they reach an age where Coors Light replaces the Pepsi and you sell the data to the Coors folks.

The game could be played on a cell phone, an iPad or an iPod. Getting real stuff for playing virtual games is an idea whose time has come, and I am willing to bet that you will see this concept exploited within 12 months. Race you to the patent office on this one.

Stuff to think about while squashing ants

On my daily walk on the steam train tracks, I watched a redwing blackbird sitting on the steel rail, picking off a line of ants marching along the track. The bird would wait until a long line of ants formed, and then would jump down and gobble as many as he could. The ants would disperse. The blackbird would then hop away, wait for the line to re-form and then repeat the procedure. The buffet marched up to him.

As I saw this, my mind was on the ants, and what they were thinking and how they did it. An ant is hatched from an egg. It has an Octo-Mom queen for a mother who has a gazillion offspring, and the first ant it sees upon hatching, is a worker drone shoving some food into its face and moving down the assembly line to do the same to its other 480 siblings of the hour. There is no learning curve. There is no time for it to be taught anything. Quicker than you can say "Nike Asian Child-worker Sweatshop", the ant is sent out into the world to beg for spare change and find uneaten McDonald's stuff in the trash bins of the world.

So where does an ant get its "being-consumed avoidance software" from? It's uncaring mummy doesn't teach it -- not with a constant stream of eggs oozing out of her derrière. The logical answer, is that the ant hatches complete with embedded firmware. For you non-techies, firmware is the internal software that makes a computer behave like a computer, and makes smart devices smart.

I've detailed the flow diagram of evaluating ant threats on a high level in the illustration. First of all, the ant sees a big black bulbous body of the blackbird with a pointy yellow beak. That is the first bit of input information. Then it sees its 312th brother twice removed get skewered by the birds beak, and gobbled down into the gaping maw. It's bye bye into the blackbird. The ant brain thinks that this can't be a pleasant experience, and issues the general alarm to run away.

This is innate programming. It is coded into the neural nets of the ant brain and present when the ant is hatched. So, I began to think of how the firmware was transferred to the ant. How did the brain software get transferred to the ant?

Of course, the answer is in the DNA of the ant. That factoid made me stop and think. Not only is DNA the blueprint of how to build the ant, but it is also the memory device for the firmware load. It can "bomb-the-PROMS' so to speak. (Hardware engineers will know that PROMS are Programmable Read - Only Memory arrays that hold the firmware. They have fuses in the transistor arrays that are blown out in programming process. Hence bombing the PROM.)

The polynucleotides that make up DNA are not only a construction pattern, but they are like USB memory sticks that holds data for the finished organism. The complex reasoning path outlined in the above illustration of the non-bye-bye-blackbird algorithm is embedded in the DNA.

Never mind Moores Law and the shrinking transistor, we are talking about a memory dump on the chemical molecular level. That got me to thinking. Is the data stored in bits and bytes? I don't think that the DNA molecule can handle binary data, because DNA itself is quaternary. Instead of zeros and ones like a computer, it has four states.

DNA is made up of adenine, thymine, guanine and cytosine. These four chemicals arranged in a chain of gazillions, holds the pattern and the data to make an ant, a flower or me. They would be much more efficent at coding information. Take the word "crap". Because binary is two state, the computer sees the word "crap" as "01100011011100100110000101110000". So while a nibble (four zeros) of binary holds up to 16 bits of information, a quaternary system will hold 256 bits. In English, "crap" takes up four units. In binary, it takes up 32 bits. In quaternary, it takes up just two bits.

So, with DNA being a good and efficient type of memory, would it be better if our computers were quaternary instead of binary? You bet. They would run a lot faster, and hold more data in less physical space. That's something to consider for computer design engineers addicted to the binary way of life.

So, back to the firmware embedded in the ants. It's intriguing to think that the brain of a lower life form is populated with a bunch a algorithms, but a human baby's brain is not. The most intriguing concept is that DNA can be altered to carry information as well. Imagine if a baby was born with the neural nets for speech already intact in the brain. Imagine if a baby was born with neural nets to do calculus while lying in the crib staring at the Fisher Price toys.

It would be a challenge to figure out how to transfer firmware with DNA, but I bet you that it could be done. Oh Brave New World.

Twitter to Web Hits Conversion Statistics

I sometimes treat the Internet like a live organism. A good biologist does subtle experiments on an organism to learn its nature. Sometimes it helps to treat the web as such.

For example, I often wondered what the empirical conversion ratio was of converting Twitter followers into web hits or page views.

It is easy to get Twitter followers. Most people on Twitter do a tit-for-tat, I'll-follow-you-if-you-follow-me thing. Needless to say, these types of followers are not high-quality in the respect that they really aren't interested in your content. All they want of you is to follow their narcissistic tweets and appreciate how important they are in the grand scheme of things.

I have a plethora of life coaches, motivational speakers, stars and celebrities that I have never heard of and other assorted folks following me on Twitter. So, the big question is how many of these can I convert to go and look at a related web page?

I decided to do a test. I put up a blog of lifestyle-specific aphorisms and created a Twitter account to match it. Then I went to work on Twitter. I followed everyone that Twitter suggested. I soon got a respectable following.

I used tools like Tweepi.com to chop out the non-followers and folks who wouldn't play the game. My Twitter account advertises my website. I am not going to give the URLs out, because the experiment is continuing and I don't want to skew the results.

After a few months of experimentation, I have a preliminary answer that is holding fast. The numbers aren't very promising. I can convert only 2.5% of my Twitter followers to become regular followers on my website.

I suppose that it doesn't matter for large numbers. If you have hundreds of thousands of followers, then the 2.5% is significant, and if you can attract that many, then you have content that will boost the percentage that convert from Twitter as well.

But if you are just starting out in social media marketing and need a rough guideline to start, the 2.5% is holding fairly steady for me. And that is my 2.5 cents.

Toby Flood Reduced To An Equation

Toby Flood is a fly-half for the Leicester Tigers, and a rugby star in the Aviva Premiership. This is his photograph from Wikipedia:


It's almost sad, but true that Toby's running game whilst playing rugby can be reduced to a mathematical equation. If you had to describe Toby's running game performance mathematically, you would do it this way:

Obviously I am not going to tell you what x and y stand for, because it came from digitizing and sifting through mounds of data to come up with the mathematical model using predictive analytics and linear regression.

However, if you wanted to choose a player with Toby's prowess, this formula would be incredibly helpful. It was derived using my software package called RugbyMetrics which adds objective knowledge of the game through data-mining and sifting through mounds of statistics.

Click on the video below to watch Toby kick a conversion after a Tiger try. The fly-half is really good!!

video

Spy Adventures in Cyber Space


This is a lighthearted piece that occurred to me in the shower, so take it for what it's worth. With the "removal" of Osama bin Laden, it occurred to me that the CIA wasn't exactly sitting on their cans, opening Democrats mail, and trying to flush out the remainder of the geriatric Russkie commies hiding in the woodwork.

The real threat turns out to be an ideological one with a marginalized population following a stone-aged religion bent on destroying another culture that they secretly admire and secretly desire. After all, didn't the 9-11 terrorists visit a strip bar a few weeks before they committed their acts of infamy?

So what does all of this have to do with software and technology and such?

I had a eureka moment for a huge, very fun spy operation that the CIA could run on the internet. Everyone needs money, and a relatively easy way to make money is by putting ads on your website, and getting paid for clicks.

So what if the CIA went to Pakistan and set up a fake Google Adsense. It could be called PiastrePlot or something similar where jihadists, the Taliban and such would put ads for Egyptian burial robes, Suicide Sam Browns suitable for strapping on a pound of plastique or a bit of nitro, Casio watches, exploding underwear, shoe bombs, Quran quote sweatbands, and such on their websites and generate some actual cash. It would be a sting operation.

The clever bit, is that the ads would track the clicks and send back the info to the CIA. The person doing the clicks would then get an investigational visit from an overhead drone.

It's about time that Spy-vs-Spy visited cyberspace and that the CIA ran an operation like the Brits did against the Germans with a floating dead human body carrying a briefcase with fake information.

This appeals to us nerd types who spend a lot of time in front of a monitor and keyboard, but dream of James Bond type adventures in exotic locales. I would contribute to the coding effort of PiastrePlot, but only if I get to meet Pussy Galore and Her Acrobats. Spying on the internet could be such fun.

Smart Content Management -- Enhancing UX and Killing Manual Navigation

Many content management systems and content management web applications fall short of serving the client well. A central precept behind content management, is that documents, videos and all sorts of content that a corporation has, should be made available to the public when they seek to buy the goods or engage the services of that company. For example, if a company sells widgets, and they issue a service bulletin to their own techs, if they put it up on the web, a customer might find it, fix their own problem and be happy with the company. Letting the public in on content not only educates them, but the additional information may trigger more sales with less costs associated with those sales.

However as a tekky, I find that a lot of content management systems are just awful. What they do, is throw a whole pile of document links in a browser, and let the viewer decide what they want to read. The User Experience is horrible. I have been the victim of this type of system, and there was a lot of frustration where I finally gave up, and went to Google with direct search terms. It always took me to a site that was better marked with the content that I wanted.

So in what ways must content management systems improve? Manual navigation through the document repository must be eliminated. There must be an AJAX widget to select related material and the content offering page must be continuously updated as more information is gleaned from the user.

A methodology for smarter content management and enhancing the User Experience through unnecessary navigation could be implemented in many ways. One of the methods is to collect the breadcrumbs of visitors to the websites, and use the progression of links to scorecard the documents and determine the logical groupings of them.

One could also mine the meta data with the same result, and assign probabilities to related sets of documents, creating less manual navigation.

This has to be an imperative improvement to content management systems, because we are on an exponential curve of generating content, and he who handles it best, wins in the marketplace. Smarter is not only better -- smarter is richer.

Google has the right idea and they have the biggest content management business in the world. Their system has smart suggestions and a relevance rating.

Over-the-counter management systems must collect meta-data on user's searches, and use that data to improve the user experience

Regress to Success -- RugbyMetrics

So let's suppose that you run a rugby team in the Aviva Premiership or any other professional rugby club. So you haven't qualified for the Heineken Cup and your team is full of journeymen players and you consistently sit in the cellar of the standings table. And let's suppose that you don't have a Daddy Warbucks owner that can buy you a Dan Carter and you want to create a competitive team.

So what are you going to do? You have to find young untried players who will eventually turn into Thomas Waldrom, Schalk Brits, Chris Aston or Tom Wood. How are you going to identify them when they haven't had a chance to prove themselves and amass some statistics to prove that they have the stuff of the egg-chasing gods.

You turn to the geeks, that's how you do it. How so? You regress your way to success. You would use my RugbyMetrics tool (click on this LINK to see all of the articles on RugbyMetrics). Then you would take a game film of your targeted acquisition and using the tool, digitize that player's performance. From there you would use advanced statistics to create a mathematic model (using regression and Bayesian inference) to determine if your player has the right stuff.

How does it work? The seeds of athletic greatness are sown early. However they may not become manifest because the player is not on a team that enhances his skillset, or he is blindside oriented on a team that is predominantly openside oriented. There are many many reasons, however that player will demonstrate the subtle qualities that shows that he has the key performance indicators that tend to greatness.

So what are these KPI's or key performance indicators? They are a new set of statistics that are gleaned from data mining every aspect of the game. These are proprietary knowledge to the users of the system. But as a trivial example, one finds that an Olly Barkley will average x amounts of carries, gaining y amounts of yards, in a certain ratio to the opposition yards gained. This is objective, scientific knowledge of the game of rugby that comes from the field of predictive analytics.

So once you have the three mathematical formulas gleaned from going through mountains of statistics, you can eliminate the pretenders and give yourself a roster of possible stars. This is not meant to replace the years of coaching and scouting, but rather it is meant to give the teams a scientific, valid starting point when scouting for new team members.

The interesting aspect is that the front 8 will have different formulas than the back seven, and each position will have different regression parameters in the models. Also style of play comes into effect as well. If you like a Tom Wood style of play, you would determine the mathematical model by analyzing his performance and looking for players who have similar numbers to him. It sure beats the shot in the dark method of a player that "looks good".

If you have any questions, please leave a comment and I will answer them.

Giving Thanks for the Free Software

Today, I just want to give props to some free software, freely available on the web that has helped me. Yesterday I mentioned the DivX codec pack, and you can download it for free from here:


Another bit of software that has helped me tremendously is converting DVD to AVI. A small free tool is fast and quick. Here is a screen shot:


The URL for downloading this tool is:




And props also go to CoolUtils.com. I needed to convert a jpg pic to an .ico icon file for Visual Studio, and it was done promptly and easily online at:



Thanks a bunch. Readers of my other blog will know that I reciprocate, putting up code for RFID and mag card readers etc.

And this was lying on my desktop. If you need a green light pic for any development that you do, I created this one pictured below. Please take it if you need it.


Thrilled and Not Thrilled With Gmail -- The Anonymous Internet is Dead

For starters, I like a web based email and I use Google's gmail. At first it was a little weird because you can't make folders and do stuff like Outlook. However, I came to see that web-based email was the cat's meow, especially since I travel a lot. Gmail is a lot more resistant to viruses, and after I went to Gmail, Chrome and Avira, I have never had a virus in spite of accidentally visiting some dodgey sites.

I have been asked to try a tool written in the Ukraine that consolidates all of my email accounts, twitter, Facebook and everything into one tool. It sounds good, but the paranoid me would never trust my communications to some code written in a land where one cannot get satisfaction through the courts if my bank password credentials were ever reported back to the coder and sold to various nefarious entities operating from behind the Old Iron Curtain.

So that got me thinking about data privacy, anonymity and such, and I came to the conclusion that it is now impossible to be totally anonymous on the Internet. With the FBI running programs like the old Predator, where every single email sent is archived and trolled through for words that are "threats to the United States of America", it is impossible to be totally anonymous. Osama bin Laden knew that, and that is why he never had internet or phone service in his hideaway.

Back to Gmail. I was reading my gmail, and was absolutely fascinated how the ads were relevant to the content of my email. Once the email is opened in the browser, an AJAX widget would report back the keywords of the content of the email and offered me ads. This is a lot like the movie Minority Report where as Tom Cruise is walking, the advertising kiosks recognize him and tail ads to his tastes.

This isn't as harmless as it sounds. Picture this. Google already knows who I am. They hold my emails for me. Then, a Google widget reads my emails when I open them and sends back the key words. How much do you want to bet that Google saves those keywords and data-mines them. Remember, they know who I am. They have asked me for a backup email address and a bunch of personal information. They hold my emails, and they save the keywords of the content of my emails. They are just one small step over a thin gray line of being Big Brother.

I remember reading a book about the Allied intelligence effort in World War II. They perfected the art of content analysis. Agents would collect the newspapers from small towns around Germany. From the aggregate, they learned the entire picture of the war effort.

From the death notices, they learned the casualty rates of the war. The social columns would print up who went off to war, and they could determine troop build-ups. Other stories and public notices about rations would give them an idea what commodities were in short supply. In other words, content analysis can reveal a lot about you -- especially if you have a software widget reading the mail.

So what's the answer. I just fashioned one - a data privacy tool where one uses an encrypted tunnel to a server, and then encrypts the traffic as well. There are all sorts of crypto keys and authentications and then comes the fun part. All of the communications, such as the email, the instant messenger and such, as well as the data storage, is never broadcast over the internet. The email is non-SMTP. The instant messaging is non-IRC, and the data storage is anti-cloud. It is not somewhere over the internet -- it is in a bunker that you can visit, and the only access is through the tunnel, plus your USB key containing all of the magic. And of course, you never trust the GUI (Graphics User Interface) to a browser. It is all rich client for security.

The last piece of the puzzle, is that you buy your own server as well to host this system. Total anonymity.

Most people don't have the luxury of their own private system, so I guess that we have to get used to the idea that Big Brother is watching, and we hope that he is a benevolent Big Brother.

I have seen Big Brother, and his name is Google.

Microsoft Codecs Bite

I just finished my rugby video application, and one huge area of risk was the Microsoft video codecs. The ones that they ship with their operating systems don't play in the sandbox well with .Net video playing code.

Specifically, when I would pause a movie, and programmatically advance the frame counter, oftentimes I would get a black screen and the movie was playing but the graphics were not being painted.

I connected a slider bar to navigate the video, and the slider bar sometimes worked and at other times, I would move the slider to a new position and hit play, and the movie would resume playing at the old position. It was hit or miss.

I finally got fed up, and converted the video .avi to div(x). Then I downloaded the DIVX codec from DIVX (quel surprise) and after installation, the video software components worked better than expected.

Not only did I get better screen resolution, but the slider worked as smooth as butter, and the degree of accuracy in frame positioning was an absolute precise delight.

You can get the free DIVX codec pack from here: