Entries Tagged 'Libraries' ↓

Extended Regular Expressions

 re.jpg

Problem:   Regular Expressions should have more core types

Regular expressions recipes live on websites.   When you want that U.S. telephone number regular expression, like “((?P<areacode>:\d{3})?\s{0,2}….“, you make this huge hash into your regular expressions.

What happens is that regular expressions become de facto lexers.  We want them to recognize the various obvious forms of telephone numbers, like “(415) 555 – 5555” or “415.555.5555” and so on.   We want the right answer, without rebuilding a recipe library from scratch each time.

Solution:  Build an extended regular expression engine that knows the basic recipes.

It would be good to have a regular expression subclass that recognizes additional special characters and functionality.   Previous extensions in regular expressions, notably in Perl, have added core building blocks such as less greedy patterns, named sub-expressions, repetition matching, and so on.  These extensions would provide easy access to common parsing problems.  I expect a good set of candidates would include:

  • Dates, like “02-03-2007′ or “03-Feb-07″ or “February 3″.
  • Times, like “2:45 a.m. GST” or “10:04:23GMT+3″
  • Credit card numbers, with or without spaces.
  • Floats, like “+2.234E20″, “3.1415″, and “42″.
  • U.S. Phone numbers, like “(301) 342-3222 ext. 2432″
  • U.S. Address Lines, like “City of Industry, MN 23423-1322″
  • Names, like “Dr. Phillip P. R. Radnov IV, MD”
  • Overseas Phone Numbers, like “+23 234 12333″
  • Quoted Strings in CSV formats

So, you can see that I’m looking at a lot of common, odd, and exception prone text processing that straddles the line between lexing and parsing.  Recognizing the half dozen forms of writing a number and then returning a number is typically done in a lexer by providing multiple rules.   Alternately, it is done by the parser in an annoyingly repetitive manner.   It should be done in a library further down, such as regular expressions.  Too many applications use different recipes and cause both incompatibilities and bugs.

One method would be to have these as macros in a regular expression class, and provide a cannonical example for post parsing.   Convenience functions would provide access by field, e.g.,

>>> x = re.match(“(?Date)\s+(?USPhone)”, “23-Feb-07  415.234.9902″)

>>> print x.group(1)   # What string matched the Date?

02-23-2003         # See, we substitute in the easy to parse date.

>>> print re.areacode(2)   # Areacode from match of group 2

415

It feels like work, but doable to make this run quickly.  That is, for the convenience functions to run quickly.   The hard part would be correctly reporting when a regular expression using extensions might give unexpected results.   For example, zip code followed by a number is ambigious for “20423-1234.234″.  Is that (“20423-1234″,”0.234″) or (“20423″, “-1234.234″)?   That problem is hard in regular expressions now.

Libraries in the Modern Age: Mall filler storefronts

 

library.jpg

Problem: Empty Mall Slots and No Libraries

Solution: Instant Temporary Libraries

Malls need full storefronts to attract traffic; too many empty storefronts leads to fewer shoppers browsing through the mall. The mall can cease to be a destination. On the other hand, a mall wants to charge a premium for its retail space, stores want long term leases to capitalize on their investments, and so temporary dead mall space is a normal part of life. Less innovate malls content themselves with renting out the space to a fly-by-night “Christmas Store”.

Turn the mall space into a temporary library.

Overnight, a branch library could exist. Many of the usual library and business rules could be relaxed for this civic minded venture, including rent. The library could pop into existence with nothing more than a couple hundred paperback books, a simplistic check out system or even an honor system, and a box for returns. It would slightly more challenging than a bookmobile. If Internet were available, the library could also put up their entire catalog with a “click here to have it ready for you next visit” functionality such as found at my library. People would return to the mall in order to return library books and to pick up ordered library books. As long as you are there, might as well have lunch or do a little shopping.

This type of approach has worked in the past. Red Ink Studios existed, rent free, in Santana Row for some time with a clear understanding that they would vacate within thirty days of paying tenant being found. This generated more foot traffic to the rest of the outdoor mall than would another empty store front.

The most contentious issue opposing adoption is simple: it’s different. Neither property management nor public libraries are particularly innovative. It would be a good for libraries, with increased visibility and a location with new patrons. It would be good for malls with increased traffic to the mall and new customers that may normally shy away from malls.

Libraries reel from the rapid changes of the Internet, digital media, and our changing patterns of connectiveness. Many libraries could rise to these challenges.