Coding

7 minute talk follow-up

Some follow-ups, corrections, and expansions.  Being correct takes effort.

First, someone very knowledgeable on the internals noted that I was sloppy with the terms ‘datastore‘, ‘GBase‘, ‘GoogleBase‘, ‘GQL‘, and ‘BigTable‘.  Mea culpa.  Datastore is the most generic term and the specific one for Google App Engine is referred to as the “App Engine datastore“.  The App Engine datastore is accessed through GQL, a language reminiscent of SQL.  The App Engine datastore is built on BigTable and exposes some of BigTable’s capabilities (see Wikipedia, a formal paper,  or video documentation).    GoogleBase is an independent Google product that is also built on BigTable.  GBase is a guitar search engine and the naturally elided form of “GoogleBase” after saying it a hundred times.   Whew!  Terms.

Speaking of terms, the contract term dealing with indemnification in the Terms of Service:

13.1. You agree to hold harmless and indemnify Google, and its subsidiaries, affiliates, officers, agents, employees, advertisers, licensors, suppliers or partners, (collectively “Google and Partners”) from and against any third party claim arising from or in any way related to (a) your breach of the Terms, (b) your use of the Service, (c) your violation of applicable laws, rules or regulations in connection with the Service, or (d) your Content or your Application, including any liability or expense arising from all claims, losses, damages (actual and consequential), suits, judgments, litigation costs and attorneys’ fees, of every kind and nature. In such a case, Google will provide you with written notice of such claim, suit or action.

The annoying clause is “(b) your use of the Service”.   Given how claims are written in patents, it is entirely likely that use of the API would be an actionable breach if the Google App Engine violated patents.  Google could require indemnification by users of the API.  For most people and smaller companies, the reputation of Google and pledges to “not be evil” should be sufficient.

Speaking of evil (note the clever transistion), the lazy index evaluation that makes the database look like “read committed” is discussed here.

Finally, high availability is hard, and “9’s” go faster than you remember.

90% (1-nine) is a downtime of 36.5 days per year.
99% (2-nines) is a downtime of 3.65 days per year.
99.9% (3-nines) is a downtime of 8.65 hours per year.
99.99% (4 nines) is a downtime of 52 minutes per year.
99.999% (5 nines) is a downtime of 5.2 minutes per year, or six seconds per week.

A claim of about 2-nines reliability is reasonable.  Google App Engine was launched around four months ago in mid-April, so about a day of downtime is 2-nines.  It was down on June 17 for some unreported number of hours, and was down again on June 19 and June 25.  Add in little outages where various features broke, blocking PayPal, and other nits.  There is a list that occasionally reports downtime, but no exact statistics are available.  If there were no future outages, and you wanted to demonstrate four nines reliability, it would need to take years to overcome the existing outages.

Keep on working on it.  I hope that Google App Engine will be more fun in the future.

del.icio.us Reddit Slashdot Digg Facebook Technorati Google StumbleUpon Tailrank Yahoo Bloglines Newsvine Spurl Fark

Coding
Reviews
Web
Write-up

Comments (1)

Permalink

Google App Engine – 7 minute talk

[I gave this talk at the BayPiggies Meeting on August 14, 2008.  The 'Newbie Nugget' talk ran about seven minutes and used only spoken words.  This is my recollection of what I said.]

I drove here today.  I got in my car, used the six speed manual transmission, steering wheel, and pedals.  I had to know how to change lanes and the rules of the road.  I needed to know traffic patterns, and directions from my house to Google.  I got here.

It was the most dangerous thing I did today.  Traffic accidents kill about 40,000 people a year in the U.S. alone.   It’s the number one cause of death for everyone from age five to about my age where heart disease and cancer start catching up.  We just take it for granted.   That’s not the car I want.

I Want My Autodriving Car!  I want to push a button on my cell phone, walk out the door a minute later, and hop into the open door of a car.  It would take me where I want to go, take the best route given current traffic, and drop me in front of the door.   It would never crash.  It would just work.  That’s what I want in a car.

Here’s what I want in an application engine.   I want to write in one language.  I want one set of tools.  One naming convention.  I want to be able to translate my thoughts into code cleanly.  I want the application to be on the Internet and scale and to share that application with everyone through a url or something like the IPhone App Store.   That’s what I want in an application server.

Google App Engine is not there yet.   Google App Engine is a step along that path, and lets you see where application engines are headed.  For the first time, you can see that installing and configuring MySQL on a server and then renting space in a rack for the server will someday sound the same as someone talking about pulling the engine in his car and changing out the rings in the driveway.  I guess you could do that yourself, but why?  You still need to work in the soup of languages:  HTML, CSS, JavaScript, HTTP headers, Flash, Python, and more.   You still need to coordinate your tools for each language:  editor coloring, debugger, make system, test harness, code coverage, and documentation.  It does us good to keep our eyes on the App Engine we want.

So, let’s talk about Google App Engine:  what is does, how to write for it, where you would use it and not.

Google App Engine is fundamentally a deployment engine.  After you write your application, it provides that application out on the Internet.  It’s also very early technology.  It sort of supports Django (’jango’ for those who talk; ‘d-jango’ for those who read), but using the latest version runs up against its thousand file limit.  Fully half of the seventy posts a day on the mailing list are about deployment quotas or looking for workarounds to deployment limits.  You cannot run any long job that takes more than a few seconds.  Still, it’s free.

Writing for Google App Engine is much like writing for any other application engine.  You use the WSGI interface, which someone taught me is pronounced ‘whiskey’, that provides a mapping between the requested URLs and the classes to serve them.  You write a class that has methods to respond to HTTP messages like get and put.  Your codes spits out the HTML and whatever to respond.  Sometimes you trigger urlfetch, from urllib, of your own application like remote procedure calls to avoid hitting computation time request limits.  You store your data in the Google Base data store.

Google Base can be a problem.  The Google Base datastore is not a relational database.  It’s a great database for working data you collect from all over the web, like web pages or screen scraping, where you are working with massive amounts of data and some of it is always aging out.  It should never be used for accounting.  For example, you can’t just run a GBase query to update the total invoices for a month.  You need to walk through all the records.  If you update the date on an invoice and then retotal the invoices, you might get the wrong answer.  GBase doesn’t guarantee the invoices index is immediately updated.  It would probably work, but if you were to try running tests to see how often it fails you’d use up your quota and be locked out for a few days.

So where is Google App Engine the right choice?  First, it’s free.   It’s a good way of learning how to write for application servers.  There are some good tutorials and videos and off you go.  If your application needs a lot of bandwidth downloading from elsewhere on the internet, it’s probably a good choice.  So if you want to try out an idea that’s not for making money, go for it.

Where is Google App Engine the wrong choice?  If you are trying to make money, it’s right out.   First, the license includes clauses that require you to indemnify Google if the API infringes patents.  A Fortune 500 company won’t let you agree to that.  Second, its reliability wants to get up to five nine’s (.99999) but is hovering between one and two nines.  Third, it’s really early tech.  You can watch lots of  bugs get filed and occasionally fixed, but they are in the stage of getting it feature complete first.

In conclusion,  I want an auto-driving car and I want my great application engine.  Google App Engine is like getting a taxi.   A taxi does the driving, but I still need to know directions and road conditions, and I still get into car accidents.   With Google App Engine, I still need to know the whole stack of languages, my site still gets hacked, and it still goes down a lot.  But you can see the future in Google App Engine, and it may turn into something great.

[Clarifications and follow-ups for this article got a bit long.  I added some expansions on this blog post.]

del.icio.us Reddit Slashdot Digg Facebook Technorati Google StumbleUpon Tailrank Yahoo Bloglines Newsvine Spurl Fark

Coding
Reviews
Write-up

Comments (1)

Permalink

KDE 4: Crud and the Quest for “The One Last and True Windowing System”

80% of effects come from 20% of causes
— Pareto Principle by Vilfredo Frederico Damaso Pareto, 1906

90% of Everything is Crud
Sturgeon’s Law by Theodore Sturgeon, 1956

100% of Everything is Crud
– Linear extrapolation of above, 2006.

There seems to have been a natural tendency for us to look at past as some magical time when quality mattered. Really, only the pure quality of finished work survived. We forget the uncountable steps towards quality.

In the world of desktop environments, the journey towards quality continues. KDE 4.1 makes a step, following trends, and aiming to be The One Last and True Windowing System.

Quality Steps and Missteps

I installed Kubuntu with KDE 4 on my laptop, and am using it to write this post. That it is written at all shows a minunum level of quality. That the post starts with these quotes shows a maximum level.

I find immediate problems when installing. The very first action, the “Read Me” during the LiveCD boot, comes up in a clipped and illegible font. During partitioning the disks, the progress bar hangs at 0% for ten minutes. The task bar lacks resizing and basic functionality. The quick launcher, Katapult, disappeared. A new program launcher experiment falls flat with some buttons activating by click and other by hovering the mouse.  Konquer still crashes.   I discover new bugs a few times per hour.  These are the the nits and bugs of a new system.

The subtle problems are the problems repeated from previous years: the LILO boot system that unhelpfully refers to Vista as “longhorn” and Kubuntu as “generic Ubuntu core”; the cobbleware of screen layout that has fonts too big for buttons, text too wide for dialogs, and odd alignments; the usual flakiness with power and wireless management. These problems persist for ages from expectation, difficulty, or blindness.

So quality is a step downward while the easy bugs are fixed. Some nice features, like FileLight are a definite step up that I expect every other distribution to copy soon. KDE did buck the trend in releasing a quality downgrade.

Trends in Window Managers

KDE 4 follows the collective wisdom of other software competitors including GNOME, Sugar, Microsoft’s Windows line, and Apple’s OS X line. It tries to be different just like everyone else.  It adds new functionality in pieces and parts.  The hodgepodge of bundling, or smush, that make up windowing systems includes GUI and interface candy, applications, and APIs.

Smush is not pejorative. Open source swaps in and out competing components and the windowing system selects components under its seal of approval and delinates APIs outside its control. Quality for each component is involved at several layers. When the power management on my laptop fails to hibernate before powering off, I could file a bug with KDE, Ubuntu (link to shuttleworth), Debian, the Linux ACPI mailing list, hardware discovery, or just fix it myself. When searching for a workaround, it could exist anywhere.

KDE is following the trend towards aggressively  cross platform deployment. By using Qt 4 as it’s underlying graphics engine, KDE hopes to deploy on desktops (Linux, MS Windows, and OS/X), cellular phones, and some embedded devices. It is currently hampered by Qt 4 having no LGPL or BSD license, requiring a special licensing cycle to deploy any commercial application. The previous Qt 4 copyright holder, TrollTech needed this revenue to continue operations. The new holder, Nokia, may free the code in order to encourage wider adoption and easier developer migration to its cellular phones.

KDE also follows the practice of wrapping more and more functionality into API layer plug-ins. Rather than commit to a scripting language, Kross wraps or interfaces to Python, Ruby and JavaScript. Rather than commit to a multimedia engine, Phonon creates one more layer of indirection for a common interface to GStreamer, QuickTime, DirectShow, and others. The new release switches many of the wrappers so now it is Phonon (multimedia), Solid (device integration), Plasma (a new desktop), Kross (scripting), DXS (application data updates), Decibel (human communications), and D-BUS replacing DCOP (application messaging). It is unclear if “one more layer of indirection” will be the correct solution in the long run.

In the long run, of course, KDE hopes to birth “The One Last and True Windowing System.”

The One Last and True Windowing System

Results 1 – 10 of about 1,040
Google search for “One API to Rule Them All”

Windowing systems race towards the goals of full functionality, sweet abstractions, and wide deployment with the winner creating a work that will last decades. Each release brings new experiments and implements growing standards. Implementations trump ideas. Differing implementations get abstracted and merged. Engineers have always raced towards sweet solutions.

Software does get finished. We still use the C programming language after more than a third of a century. The syntax, style, and assumptions are passed to new languages such as C++, Python and Ruby. Other contenders failed. Hardware architectures now handle pointer indirections, sequential arrays, and null terminated strings. C goes through occasional updates and forks, but it endures. Also enduring are the most of the Internet stack including TCP/IP, DNS/Bind, and packet switching; HTTP protocols, SQL, and many more. These become the common parts upon which we solve new problems. Many hope to finish writing the desktop software.

FreeDesktop.org gently pushes towards standardization by facilitating discussion. It speeds the process of making compatible, then integrating, then merging competitng development that has become similar. It’s influence on KDE is unmistakable.

So, will KDE give birth to the “One Last and True Windowing System” that lasts for decades?

Conclusions

KDE took a significant risk. It’s adoption of Qt 4, dropping of DCOP, and making so many changes significantly hurt its quality. On the other hand, it might provide a new level of functionality to catapult into the top three windowing systems. Time will tell. For people just wanting a desktop to work, stay away for six months. Continue Reading »

del.icio.us Reddit Slashdot Digg Facebook Technorati Google StumbleUpon Tailrank Yahoo Bloglines Newsvine Spurl Fark

Coding
Reviews
Web
Write-up

Comments (0)

Permalink

Fun with Programming a Technology Tree

Technology Tree for Civilization version 2

I spent over a day writing a silly Python program to read in a Civilization 2 Technology Tree.

I learned:

  • Python assert statements tend to fail silently during common misuse. Blogged about it. Suggested fixing it on the Py3k mailing list. Guido says there is a SyntaxWarning now.
  • Google docs appears to use the python csv module to export to csv. Their spreadsheet works fairly well. I like the auto-save/auto-versioning.
  • The csv package is pretty inflexible. It cannot discover the dialect of comma-separated-vague file it is passed. It works for, and is designed for, times when the export is either excel or Python. The Dialects do allow you to set some other basic options.
  • nose is a great testing tool. nose.tools confuses pylint because of its on-the-fly playing with __all__. I should probably write a Wikipedia article on it.
  • Whenever debugging recursion, make a note both when calling the recursed function and when returning. It makes digging through the log much easier.
  • Nothing tests your code like a large, real world, example.
  • The logging function rocks far less well. I filed bug on it: it picks up the wrong %(filename)s. This bug has apparently been going back and forth for years.
  • Reading a correctly validated input file is about 10x the effort of reading an incorrectly validated one.
  • Testing code is fairly easy and a bit bulky. It’s real cost is that it forces that 10x effort in validating the input in order to pass the tests. I could get behind Test Driven Development.
  • The Civilization 2 Technology Tree has five errors, including two “Destroyer” units and a bunch of redundant dependencies, such as Fusion Power doesn’t need to depend on Nuclear Power.
  • Sets in Python work well.
  • WordPress has syntax coloring plug-ins that work fairly well, and it can handle arbitrary files.
  • Programming is still fun.

So, with no plans to do anything with this:

tech.py — This is Python code. It loads the technology tree and doesn’t do anything with it.

civtech.csv — A CSV file with the Civilization 2 technology tree

CivChart — A GoogleDoc spreadsheet with that same technology tree

This post cleverly delayed for a few days to space out my postings. :)

del.icio.us Reddit Slashdot Digg Facebook Technorati Google StumbleUpon Tailrank Yahoo Bloglines Newsvine Spurl Fark

Coding
Personal
Write-up

Comments (0)

Permalink

Python Assert Fails Silently?

Problem: Python assert statements are prone to silently fail in obvious misuse.

Solution: Modify Python assert statement to “assert condition as message”

Python usually does the right thing ™. That is, usually a programmer’s code does what is expected without odd language gotcha’s. Here is one of the gotcha’s:

  1. def do_with_file(filename):
  2.     assert(len(filename)>0 and filename[0] <> ‘ ‘, ‘filename (%s) not valid’ % filename)
  3.     …

Seems reasonable? Sorry, that assert is equivalent to:

  1. assert True

because your parenthesis made a tuple. You meant to type this:

  1. assert len(filename)>0 and filename[0] <> ‘ ‘, ‘filename (%s) not valid’ % filename

Python 3.0 should be modified to require this:

  1. assert len(filename)>0 and filename[0] <> ‘ ‘ as ‘filename (%s) not valid’ % filename

or just make assert a built_in function and, therefore, require the parenthesis.

del.icio.us Reddit Slashdot Digg Facebook Technorati Google StumbleUpon Tailrank Yahoo Bloglines Newsvine Spurl Fark

Coding
Idea
Ideas

Comments (6)

Permalink

FavIcon integrated editor

Problem: I want a web browser plug-in that can edit the FavIcon for my bookmarks.

Solution: Build one.

There exist dozens of online FavIcon editors on the web. The FavIcon is the small graphic in the corner of the address book; the one for this site looks like this .  Some tools, like FavIcon Picker 2 let bookmarks be reduced to just the icon, making a compact tool bar of bookmarks. Unfortunately, not every site has a favorite icon, and different services help on the same favorite icon. For example, Google, Google (Linux) search, and Google Analytics have the same favicon.  I want to make my own icon as a minor variation of the single standard icon.  Alternately, I want to make dirt simple favicons for sites that have none.

My feature list for such a program would include:

  • Take an existing favicon and alter the colors.  The red “G” might be search, the “blue” might be analytics.
  • Add a letter overlay onto existing favicons, so my “Google Linux” button has an “L” on it.
  • Create a one or two character favicon so that the cool Widgets site with no icon could get a “red Wi” favicon in my bookmarks.

None of this is particularly odd or difficult.  It’s a matter of a GUI front end, some data files, and some shelling out to ImageMagick to do the actual work.  Hmm….

del.icio.us Reddit Slashdot Digg Facebook Technorati Google StumbleUpon Tailrank Yahoo Bloglines Newsvine Spurl Fark

Coding
Idea
Ideas
Products

Comments (0)

Permalink

Writing Python GUI Toolkits for Testability (for PyGTK et al)

testing.jpg

Problem: Testing GUIs tends to be hard.

Writing a reasonable test for GUIs usually involves contortions to find the correct widget and values. For example, a test might want to confirm that “the font is now bold for the third text field, named system_danger_level, in the hbox in the floating frame in the second panel in the third tab bar in the dialog box”. Figuring out how to tell if the test passed is usually difficult.

Solution: Add a function in the GUI framework that returns a single large data structure for the state of the GUI. Use standard Python programming to navigate it.

The test becomes:

assert(gobject.dump(”system_danger_level”)["font"]["style"] == “bold”)

This is one of my poorly researched ideas: it came up while talking with Shandy before Mark Shuttleworth’s talk last night. All GUI frameworks are inherently a bit crufty and hard to navigate. On the other hand, the data types in Python are rich: dictionaries, arrays, nested structures, etc. While handing such a data structure to PyGTK to modify the GUI might require a lot of writing, asking PyGTK to disgorge such a data structure is far easier.

Consider adding it the the GObject functions. Call a new gobject.dump_main_context() and get a huge Python data structure back. It might have lots of redundant methods of finding the same data. For example, a tree of all the contexts or dialog boxes and the usual tree objects inside that are grabbed by tools like Guitar’s GUI Ripper. It might also have a handy hash of object id’s and their associated sub-records, like an index into the big tree.

While some may decry the memory and time cost of creating this tree might have the legitimate wish for a second function known simply as gobject.dump(). It would take a single identifier tag and return a single object from that tag ‘downwards’ in detail. A well published heuristic would have it “do the right thing” when given a tag that exists in multiple places.
This feels like a “implement it first and then see if its useful” type of hack.

del.icio.us Reddit Slashdot Digg Facebook Technorati Google StumbleUpon Tailrank Yahoo Bloglines Newsvine Spurl Fark

Coding
Idea
Ideas

Comments (1)

Permalink

Rethinking the Font Chooser

signs.jpg

Problem: Font chooser are boxes cause more load as one has more fonts.

Font choosing options have barely evolved from the very first days of the Lisa where a user could choose between *gasp* five fonts. Now, there are thousands of fonts and the font chooser boxes look the same.

Solution: Visualize fonts spatially according to useful categoriztions

There exist various categorization methods for fonts: technical, quantitative, typographical, and subjective. Pick some within the list: measured reading speed (quantitative), bitmap versus stroke based (technical), or rankings (subjective); then map the fonts onto a sphere, fictional map (the isle of bitmaps), or plain old grid. That way, a user can pick a font from among other fonts that share the sameessential characteristics, e.g., slow reading speed.

Ah, but were ideas code.

del.icio.us Reddit Slashdot Digg Facebook Technorati Google StumbleUpon Tailrank Yahoo Bloglines Newsvine Spurl Fark

Coding
Idea
Ideas

Comments (0)

Permalink

Extended Regular Expressions

 re.jpg

Problem:   Regular Expressions should have more core types

Regular expressions recipes live on websites.   When you want that U.S. telephone number regular expression, like “((?P<areacode>:\d{3})?\s{0,2}….“, you make this huge hash into your regular expressions.

What happens is that regular expressions become de facto lexers.  We want them to recognize the various obvious forms of telephone numbers, like “(415) 555 – 5555” or “415.555.5555” and so on.   We want the right answer, without rebuilding a recipe library from scratch each time.

Solution:  Build an extended regular expression engine that knows the basic recipes.

It would be good to have a regular expression subclass that recognizes additional special characters and functionality.   Previous extensions in regular expressions, notably in Perl, have added core building blocks such as less greedy patterns, named sub-expressions, repetition matching, and so on.  These extensions would provide easy access to common parsing problems.  I expect a good set of candidates would include:

  • Dates, like “02-03-2007′ or “03-Feb-07″ or “February 3″.
  • Times, like “2:45 a.m. GST” or “10:04:23GMT+3″
  • Credit card numbers, with or without spaces.
  • Floats, like “+2.234E20″, “3.1415″, and “42″.
  • U.S. Phone numbers, like “(301) 342-3222 ext. 2432″
  • U.S. Address Lines, like “City of Industry, MN 23423-1322″
  • Names, like “Dr. Phillip P. R. Radnov IV, MD”
  • Overseas Phone Numbers, like “+23 234 12333″
  • Quoted Strings in CSV formats

So, you can see that I’m looking at a lot of common, odd, and exception prone text processing that straddles the line between lexing and parsing.  Recognizing the half dozen forms of writing a number and then returning a number is typically done in a lexer by providing multiple rules.   Alternately, it is done by the parser in an annoyingly repetitive manner.   It should be done in a library further down, such as regular expressions.  Too many applications use different recipes and cause both incompatibilities and bugs.

One method would be to have these as macros in a regular expression class, and provide a cannonical example for post parsing.   Convenience functions would provide access by field, e.g.,

>>> x = re.match(”(?Date)\s+(?USPhone)”, “23-Feb-07  415.234.9902″)

>>> print x.group(1)   # What string matched the Date?

02-23-2003         # See, we substitute in the easy to parse date.

>>> print re.areacode(2)   # Areacode from match of group 2

415

It feels like work, but doable to make this run quickly.  That is, for the convenience functions to run quickly.   The hard part would be correctly reporting when a regular expression using extensions might give unexpected results.   For example, zip code followed by a number is ambigious for “20423-1234.234″.  Is that (”20423-1234″,”0.234″) or (”20423″, “-1234.234″)?   That problem is hard in regular expressions now.

del.icio.us Reddit Slashdot Digg Facebook Technorati Google StumbleUpon Tailrank Yahoo Bloglines Newsvine Spurl Fark

Coding
Idea
Ideas
Invention
Libraries

Comments (0)

Permalink

OLPC Scrabble with Definitions

 scrab.jpg220px-laptopolpc_a.jpg

This is a cool feature I have yet to see implemented.

Problem: Scrabble is a good way to learn spelling, but people make words they don’t understand.

Scrabble style games encourage spelling by having people put together letters in odd combinations. For less strenuous play, most computer versions of the game dispense with the ‘challenge a misspelled word’ option. This leads to players to try to explore new words by trial and error. When a young player tries to put an ‘H’ after ‘PIT’, they discover a new word.

Solution: Provide definitions of the word inside the game.

Tool tips provide one simple solution. Hovering the cursor over a word would pop-up containing the definition of the word as helpfully provided by wiktionary. This provides another level of passive learning in the same experience.

For advanced players, one could play against someone speaking a different language. Words of both player’s languages would be accepted, and the tool-tips would provide spelling and pronunciation hints. While one could imagine triggering off videos, such as matching Colingo videos, most additional help would come from chatting with the other player.

Lots of possibilities.

del.icio.us Reddit Slashdot Digg Facebook Technorati Google StumbleUpon Tailrank Yahoo Bloglines Newsvine Spurl Fark

Coding
Idea
Ideas

Comments (0)

Permalink