org.hd.d.pg2k.ai.scorer
Class ScorerCreator

java.lang.Object
  extended by org.hd.d.pg2k.ai.scorer.ScorerCreator

public final class ScorerCreator
extends java.lang.Object

Creates new Scorer (parameter-set) instances given an existing population.

Author:
dhd

Nested Class Summary
static class ScorerCreator.ScorerWork
          Class to encapsulate all background and evolution work for a given ScorerCache.
 
Field Summary
static java.lang.String DT__GEN_DATA
          Data type parameter name for generic data/query; never null nor empty.
static java.lang.String DT_BESTSC
          Data type parameter value to request server's best Scorers; never null nor empty.
static java.lang.String DT_CALIB
          Data type parameter value to request calibration-set data; never null nor empty.
static java.lang.String DT_PARAM_NAME
          Parameter name to request particular remote data type; never null nor empty.
static java.lang.String DT_POSTNEWSC
          Data type parameter value to POST new 'star' Scorer; never null nor empty.
static int MIN_SUGGESTED_CALIB_SET_SIZE
          Minimum suggested calibration-set size; strictly positive.
private static boolean PERF_SAMPLE_CW
          If true then monitor and report performance (CPU usage) in computeWeighting().
private static java.util.concurrent.ConcurrentHashMap<java.lang.StackTraceElement,java.util.concurrent.atomic.AtomicInteger> perfCountsCW
          Cumulative performance stats (if any) for computeWeighting(); null iff none being collected.
static java.lang.String scorerTunnelRRURL
          Tunnel root-relative (starting with '/') URL within the server; never null.
static java.lang.String SSOURCE_GA
          Scorer 'source' for the GA combine/mutate method; never empty nor null.
static java.lang.String SSOURCE_MIN
          Scorer 'source' for the "minimiser" method; never empty nor null.
static java.lang.String SSOURCE_REC
          Scorer 'source' for recovering from persistent/shared store; never empty nor null.
static int SUGGESTED_CALIB_SET_SIZE
          Server-side calibration set size; strictly positive and no less than MIN_SUGGESTED_CALIB_SET_SIZE.
 
Constructor Summary
ScorerCreator()
           
 
Method Summary
static ScoreAndConf computeWeighting(java.util.Map<Name.ExhibitShort,ScoreAndConf> scorerResult, java.util.Map<Name.ExhibitShort,ScoreAndConf> calibrationData)
          Computes ScoreAndConf over the supplied calibration exhibits; never null but may be (0,0) where the scorer is unknown or untested.
static Tuple.Pair<ScoreAndConf,java.lang.Boolean> computeWeighting(java.util.Map<Name.ExhibitShort,ScoreAndConf> calibrationData, ScorerCacheIF scorerCache, ScorerIF scorer, AllExhibitImmutableData aeid, boolean allowStale)
          Computes ScoreAndConf over the supplied calibration exhibits using the supplied Scorer cache; never null but may be (0,0) where the scorer is unknown or untested.
private static ScorerIF constructModifiedSNP(double[] par, ScorerIF initialScorer, java.util.List<ScorerParam> paramDefsAndValuesOriginal, int[] mnIndexToParamIndex)
          Create a variant Scorer given an extant same-type Scorer and the array of values of the variable parameters; never null.
static boolean createNewByGA(ScorerCacheIF cache, ScorerPopulation population, SimpleLoggerIF log)
          Create a new Scorer (parameter set) from existing ones using normal GA techniques.
static boolean createNewByOpt(SimpleExhibitPipelineIF dataSource, ScorerCacheIF cache, ScorerPopulation population, java.util.Map<Name.ExhibitShort,ScoreAndConf> calibSubset, SimpleLoggerIF log, long endTime)
          Create a new Scorer (parameter set) from existing ones using a multi-variable "minimiser".
private static java.util.List<java.lang.String> pickBreedingSet(ScorerCacheIF cache, ScorerPopulation population, boolean allowStale)
          Select at random a best-first breeding set of one Scorer base type; null if none available.
static boolean retrieveOldScorer(SimpleExhibitPipelineIF dataSource, ScorerCacheIF cache, ScorerPopulation population, SimpleLoggerIF log)
          This method attempts to retrieve a "new" Scorer from the persisted history.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

SSOURCE_REC

public static final java.lang.String SSOURCE_REC
Scorer 'source' for recovering from persistent/shared store; never empty nor null.

See Also:
Constant Field Values

SSOURCE_GA

public static final java.lang.String SSOURCE_GA
Scorer 'source' for the GA combine/mutate method; never empty nor null.

See Also:
Constant Field Values

SSOURCE_MIN

public static final java.lang.String SSOURCE_MIN
Scorer 'source' for the "minimiser" method; never empty nor null.

See Also:
Constant Field Values

PERF_SAMPLE_CW

private static final boolean PERF_SAMPLE_CW
If true then monitor and report performance (CPU usage) in computeWeighting(). Turn this on for (Scorer) performance tuning only...

This should give a reasonable view of the most compute-intensive Scorer workload.

See Also:
Constant Field Values

perfCountsCW

private static final java.util.concurrent.ConcurrentHashMap<java.lang.StackTraceElement,java.util.concurrent.atomic.AtomicInteger> perfCountsCW
Cumulative performance stats (if any) for computeWeighting(); null iff none being collected.


MIN_SUGGESTED_CALIB_SET_SIZE

public static final int MIN_SUGGESTED_CALIB_SET_SIZE
Minimum suggested calibration-set size; strictly positive. Note that if multiple distinct Scorer populations and target exhibit types are to be covered, then a set of this size or greater should be gathered for each of them. It may then be convenient to work with the union of those values for all Scorer types.

This should be large enough that no one Scorer has enough parameters to trivially map each different calibration exhibit to its correct score.

In general, while using a larger calibration sample set increases CPU and memory costs, especially for remote 'AH' clients that may need to fetch data, a larger calibration set should increase the reliability of calibrated results. But there may be a huge advantage in using a set small enough to keep in cache close to the top of the memory hierarchy.

We may want to try different calibration set sizes larger than the minimum to improve the usefulness and predictive power of the calibration set.

A power of two may allow for generation of slightly better code in various places.

Experience suggests that this value should be at least 128.

See Also:
Constant Field Values

SUGGESTED_CALIB_SET_SIZE

public static final int SUGGESTED_CALIB_SET_SIZE
Server-side calibration set size; strictly positive and no less than MIN_SUGGESTED_CALIB_SET_SIZE. A larger calibration set of this size implies use of more memory and CPU time, but with a higher reliability of answer, than using a calibration set of size MIN_SUGGESTED_CALIB_SET_SIZE.

Needs to be at least as large as the minimum 'voted-for' set to give good confidence.

A power of two may allow for generation of slightly better code.


scorerTunnelRRURL

public static final java.lang.String scorerTunnelRRURL
Tunnel root-relative (starting with '/') URL within the server; never null.

See Also:
Constant Field Values

DT_PARAM_NAME

public static final java.lang.String DT_PARAM_NAME
Parameter name to request particular remote data type; never null nor empty.

See Also:
Constant Field Values

DT_CALIB

public static final java.lang.String DT_CALIB
Data type parameter value to request calibration-set data; never null nor empty.

See Also:
Constant Field Values

DT_BESTSC

public static final java.lang.String DT_BESTSC
Data type parameter value to request server's best Scorers; never null nor empty.

See Also:
Constant Field Values

DT_POSTNEWSC

public static final java.lang.String DT_POSTNEWSC
Data type parameter value to POST new 'star' Scorer; never null nor empty.

See Also:
Constant Field Values

DT__GEN_DATA

public static final java.lang.String DT__GEN_DATA
Data type parameter name for generic data/query; never null nor empty.

See Also:
Constant Field Values
Constructor Detail

ScorerCreator

public ScorerCreator()
Method Detail

retrieveOldScorer

public static boolean retrieveOldScorer(SimpleExhibitPipelineIF dataSource,
                                        ScorerCacheIF cache,
                                        ScorerPopulation population,
                                        SimpleLoggerIF log)
                                 throws java.io.IOException
This method attempts to retrieve a "new" Scorer from the persisted history. This mechanism is used to recover previous known-good Scorers from local store and/or share good Scorers between all participating servers. This makes for faster start-up of new server instances, a local and global memory of good genotypes, and sharing of good genotypes between local populations.

Returns:
true if a Scorer was found that was not in the extant population (though may not have been good enough to remain in it)
Throws:
java.io.IOException

createNewByGA

public static boolean createNewByGA(ScorerCacheIF cache,
                                    ScorerPopulation population,
                                    SimpleLoggerIF log)
                             throws java.io.IOException
Create a new Scorer (parameter set) from existing ones using normal GA techniques. This uses selection, mutation and recombination to attempt to find a "better" Scorer version based on those in the existing population.

Parameters:
population - the extant population; never null
Returns:
true iff we created and evaluated a new Scorer not in the extant population
Throws:
java.io.IOException

constructModifiedSNP

private static ScorerIF constructModifiedSNP(double[] par,
                                             ScorerIF initialScorer,
                                             java.util.List<ScorerParam> paramDefsAndValuesOriginal,
                                             int[] mnIndexToParamIndex)
Create a variant Scorer given an extant same-type Scorer and the array of values of the variable parameters; never null.


createNewByOpt

public static boolean createNewByOpt(SimpleExhibitPipelineIF dataSource,
                                     ScorerCacheIF cache,
                                     ScorerPopulation population,
                                     java.util.Map<Name.ExhibitShort,ScoreAndConf> calibSubset,
                                     SimpleLoggerIF log,
                                     long endTime)
                              throws java.io.IOException
Create a new Scorer (parameter set) from existing ones using a multi-variable "minimiser". This selects a "good" Scorer and then attempts to find a better/optimal one derived from it using mathematical error-minimisation techniques.

For the moment this only attempts to adjust the continuously-variable properties (ie the integer parameter values) and leaves the enumeration parameters alone.

Parameters:
population - the extant population; never null
calibSubset - if non-null, should be non-empty and is a calibration set to optimise/minimise against
endTime - target end time to stop computations at
Returns:
true iff we created and evaluated a new Scorer not in the extant population
Throws:
java.io.IOException

pickBreedingSet

private static java.util.List<java.lang.String> pickBreedingSet(ScorerCacheIF cache,
                                                                ScorerPopulation population,
                                                                boolean allowStale)
Select at random a best-first breeding set of one Scorer base type; null if none available. Returns the non-empty breeding set of a parameterised Scorer selected at random, or null if no such breeding set is available.

We weight this more heavily towards breeding sets whose best constituent has a high "goodness".

Parameters:
population - the extant population; never null
Returns:
null or in-order (best-first) non-empty breeding set of same Scorer type

computeWeighting

public static Tuple.Pair<ScoreAndConf,java.lang.Boolean> computeWeighting(java.util.Map<Name.ExhibitShort,ScoreAndConf> calibrationData,
                                                                          ScorerCacheIF scorerCache,
                                                                          ScorerIF scorer,
                                                                          AllExhibitImmutableData aeid,
                                                                          boolean allowStale)
                                                                   throws java.io.IOException
Computes ScoreAndConf over the supplied calibration exhibits using the supplied Scorer cache; never null but may be (0,0) where the scorer is unknown or untested. This can be used as part of the computation of a Scorer's weighting. The input data must not change while this routine is running.

Parameters:
calibrationData - map from short exhibit names to external calibration data (eg actual user vote-based score); never null
Returns:
the score (first in the Pair) represents the correlation with the underlying votes (and whatever the scoring is measured against) with MAX meaning perfect correlation, 0 meaning no correlation, and -MAX meaning perfectly wrong answers all the time, and the confidence 0 if we have no (or very/too few) data points and approaching MAX as we have a large (enough) number of data points; the second element of the Pair is true if the the value is "partial" (we had to prematurely stop calculation or did not have enough data points) and so for example should not be cached
Throws:
java.io.IOException

computeWeighting

public static ScoreAndConf computeWeighting(java.util.Map<Name.ExhibitShort,ScoreAndConf> scorerResult,
                                            java.util.Map<Name.ExhibitShort,ScoreAndConf> calibrationData)
Computes ScoreAndConf over the supplied calibration exhibits; never null but may be (0,0) where the scorer is unknown or untested. This can be used as part of the computation of a Scorer's weighting.

All keys in the calibration data should map to non-null values in the ratings map (ie the ratings keys can be a superset of the calibration keys); missing values will diminish the confidence of the result.

The input data must not change while this routine is running.

Parameters:
scorerResult - map from short exhibit name to Scorer's rating/result for the exhibit; never null
calibrationData - map from short exhibit name to external calibration data (eg actual user vote-based score); never null
Returns:
the score represents the correlation with the underlying votes (and whatever the scoring is measured against) with MAX meaning perfect correlation, 0 meaning no correlation, and -MAX meaning perfectly wrong answers all the time, and the confidence 0 if we have no (or very/too few) data points and approaching MAX as we have a large (enough) number of data points

DHD Multimedia Gallery V1.60.69

Copyright (c) 1996-2012, Damon Hart-Davis. All rights reserved.