org.hd.d.pg2k.clApp.offline
Class MakeCCTLDFromIPPrefixProperties

java.lang.Object
  extended by org.hd.d.pg2k.clApp.offline.MakeCCTLDFromIPPrefixProperties

public class MakeCCTLDFromIPPrefixProperties
extends java.lang.Object

Off-line utility to create properties file for IPv4 to ccTLD mapping.


Field Summary
private static int CONTINENT_PREFIX_LOSSY_COMPRESS_THRESHOLD
          Threshold (out of 256) at which we will lump countries/regions into a continent when doing lossy cc-prefix-table compression; strictly positive.
static java.lang.String INPUT_FORMAT_HOSTIP
          Input data format name for MySQL-dump download from www.hostIP.info (circa 2006/04).
static java.util.SortedSet<java.lang.String> INPUT_FORMATS
          All allowed input formats; immutable and never null.
private static int REGION_PREFIX_LOSSY_COMPRESS_THRESHOLD
          Threshold (out of 256) at which we will lump countries into a region when doing lossy cc-prefix-table compression; strictly positive.
private static int UNCONDITIONAL_PREFIX_LOSSY_COMPRESS_THRESHOLD
          Threshold (out of 256) at which we will discard minority values when doing lossy cc-prefix-table compression; strictly positive.
 
Constructor Summary
private MakeCCTLDFromIPPrefixProperties()
          Prevent instances from being constructed.
 
Method Summary
private static void _lossyTrimPrefixesAtLength(java.util.SortedMap<AddrTools.AddrPrefix,java.lang.String> map, int lengthToTrim)
          Try to trim away leaves at the given prefix length.
static void dumpPrefixMap(java.util.SortedMap<AddrTools.AddrPrefix,java.lang.String> map, java.io.PrintWriter pw)
          Dump an IP-prefix map as-is to the given Writer.
private static java.lang.String getNextRecordStarting(java.io.BufferedReader r, java.lang.String lineStart)
          Returns the next line/record starting with the specified String, else null at EOF.
private static java.util.Map<AddrTools.AddrPrefix,java.lang.String> loadData(java.io.File inputData, java.lang.String inputFormat)
          Load external IP/location data into internal-style table; never null.
private static java.util.SortedMap<AddrTools.AddrPrefix,java.lang.String> lossyCompressCcTLDFromIPPrefixMap(java.util.SortedMap<AddrTools.AddrPrefix,java.lang.String> mapIn)
          Perform (lossy) compression on an IP-prefix-to-ccTLD map.
static void main(java.lang.String[] args)
          An entry point to load the prefix map and write it out in a more compact format.
private static java.lang.String[][] parseINSERTRecord(java.lang.String record, int expectedFields, boolean ignoreBadRecords)
          Parse a MySQL insert record from a data dump.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

INPUT_FORMAT_HOSTIP

public static final java.lang.String INPUT_FORMAT_HOSTIP
Input data format name for MySQL-dump download from www.hostIP.info (circa 2006/04).

See Also:
Constant Field Values

INPUT_FORMATS

public static final java.util.SortedSet<java.lang.String> INPUT_FORMATS
All allowed input formats; immutable and never null.


UNCONDITIONAL_PREFIX_LOSSY_COMPRESS_THRESHOLD

private static final int UNCONDITIONAL_PREFIX_LOSSY_COMPRESS_THRESHOLD
Threshold (out of 256) at which we will discard minority values when doing lossy cc-prefix-table compression; strictly positive. If all the values in a given block share the same value except a small minority, and the count of minority items is less than or equal to this value then we pretend that the exception did not exist so we can trim down the tree at this point.

This is computed so that given the rough expected maximum cost of a worst-possible routing error (paying transit and getting ropey connectivity rather than a fast free local connection) we will not on average pay more than about twice the optimum routing costs because of our lossy encoding. This assumes that we'd never do much better than "country" level routing.


REGION_PREFIX_LOSSY_COMPRESS_THRESHOLD

private static final int REGION_PREFIX_LOSSY_COMPRESS_THRESHOLD
Threshold (out of 256) at which we will lump countries into a region when doing lossy cc-prefix-table compression; strictly positive. Provided that all but this number of entries in a given block are in the same region as the dominant ccTLD in that block then during lossy compression we can replace them with that dominant ccTLD.

The cost of this is assumed to be mainly routing cost for the wrong country within a region when claiming the dominant ccTLD. (We currently neglect the possible cost of completely-wrong routing out-of-region.)


CONTINENT_PREFIX_LOSSY_COMPRESS_THRESHOLD

private static final int CONTINENT_PREFIX_LOSSY_COMPRESS_THRESHOLD
Threshold (out of 256) at which we will lump countries/regions into a continent when doing lossy cc-prefix-table compression; strictly positive. Provided that all but this number of entries in a given block are in the same region as the dominant ccTLD or region in that block then during lossy compression we can replace them with that dominant ccTLD/region.

The cost of this is assumed to be mainly routing cost for the wrong country within a continent when claiming the dominant ccTLD/region. (We currently neglect the possible cost of completely-wrong routing out-of-contient.)

Constructor Detail

MakeCCTLDFromIPPrefixProperties

private MakeCCTLDFromIPPrefixProperties()
Prevent instances from being constructed.

Method Detail

dumpPrefixMap

public static final void dumpPrefixMap(java.util.SortedMap<AddrTools.AddrPrefix,java.lang.String> map,
                                       java.io.PrintWriter pw)
Dump an IP-prefix map as-is to the given Writer. This does no "optimisation" or other transformation.

The output is printed in sorted order by the address-prefix key.

The output should be directly suitable to form all or part of a ccTLDFromIPPrefix.properties set.


lossyCompressCcTLDFromIPPrefixMap

private static java.util.SortedMap<AddrTools.AddrPrefix,java.lang.String> lossyCompressCcTLDFromIPPrefixMap(java.util.SortedMap<AddrTools.AddrPrefix,java.lang.String> mapIn)
Perform (lossy) compression on an IP-prefix-to-ccTLD map. This may use lossy compression of the loaded values where the saving is large and the implied increase in routing/performance cost is low.

We never delete "" entries to avoid semantic changes, nor top-level (1-octet) entries for clarity.


_lossyTrimPrefixesAtLength

private static void _lossyTrimPrefixesAtLength(java.util.SortedMap<AddrTools.AddrPrefix,java.lang.String> map,
                                               int lengthToTrim)
Try to trim away leaves at the given prefix length. We never remove "" entries.

We won't remove entries at the stated length that have longer sub-entries. This implies that trimming should be performed from the longest prefixes down to the shortest.

The map is updated in place.


loadData

private static java.util.Map<AddrTools.AddrPrefix,java.lang.String> loadData(java.io.File inputData,
                                                                             java.lang.String inputFormat)
                                                                      throws java.io.IOException
Load external IP/location data into internal-style table; never null. This data can potentially be in one of several formats, each of which will require some massaging to get into our model.

Only input data records corresponding to ccTLDs explicitly listed in the "geo-proximity" data, ie whose getCloseCCTLDs() result is non-empty, will be retained.

We construct and return an unsorted map for speed.

Parameters:
inputData - the (readable) input data file; never null
inputFormat - the format (one of INPUT_FORMATS); never null
Returns:
input data in our format, for "interesting" ccTLDs
Throws:
java.io.IOException

parseINSERTRecord

private static java.lang.String[][] parseINSERTRecord(java.lang.String record,
                                                      int expectedFields,
                                                      boolean ignoreBadRecords)
Parse a MySQL insert record from a data dump. This parses a MySQL record of the format:
INSERT INTO `table name` VALUES (field1,field2,...),...(...);
 

Parameters:
record - the full line; never null
expectedFields - the expected number of expectedFields per record; non-negative
ignoreBadRecords - silently skip bad records that we cannot parse (for example because they contain our separator characters)
Returns:
zero or more rows each of the specified number of expectedFields; no null records or fields

getNextRecordStarting

private static java.lang.String getNextRecordStarting(java.io.BufferedReader r,
                                                      java.lang.String lineStart)
                                               throws java.io.IOException
Returns the next line/record starting with the specified String, else null at EOF.

Throws:
java.io.IOException

main

public static final void main(java.lang.String[] args)
An entry point to load the prefix map and write it out in a more compact format. This may use lossy compression of the loaded values where the saving is large and the implied increase in routing/performance cost is low.

The new map is dumped to the file named as the first argument.

If a source of new input data is supplied as an optional second argument, then it is merged into the existing data in memory, and the dump will be the lossily-compressed result. Only new input data corresponding to ccTLDs explicitly listed in the "geo-proximity" data, ie whose getCloseCCTLDs() result is non-empty, will be retained. This input data file may need to be re-read in multiple passes.


DHD Multimedia Gallery V1.50.55

Copyright (c) 1996-2008, Damon Hart-Davis. All rights reserved.