Class fileindexer

Description

The file indexer class.

This class indexes files on disc, either one by one or as a whole file hierarchy tree.

Located in /search-fileindex-defs.php (line 37)


	
			
Variable Summary
 mixed $application
 mixed $host
 mixed $idoffset
 mixed $idprefix
 mixed $idsource
 mixed $indexfields
 mixed $ixid
 mixed $metascan
 mixed $meta_fields
 mixed $nice_msecs
 mixed $port
 mixed $timeoutsecs
 mixed $timer
Method Summary
 fileindexer fileindexer ([string $application = "?"], [string $host = ""], [string $port = ""])
 void define_field (string $fieldname, string $type, [boolean $stored = STORED], [boolean $indexed = INDEXED])
 void id_generate ([integer $idsource = ID_FROM_INC], [mixed $pfxofs = ""])
 void index_field (string $fieldname, string $fieldvalue)
 void index_file (string $path, string $id, [mixed $fields = false])
 array index_tree (string $path, [$patt $patt = ""], [$mode $mode = "dir"], integer $nice_msecs)
 void meta_field (string $fieldname, string $type, [boolean $stored = STORED], [boolean $indexed = INDEXED])
 void noscantags ()
 void scantags ()
Variables
mixed $application = "" (line 40)

Application we are indexing for

mixed $field_definitions = array() (line 63)

Index fields definitions array. Contains definitions

mixed $host = "" (line 42)

Host to connect to

mixed $idoffset = 0 (line 71)

ID generation offset

mixed $idprefix = "" (line 74)

ID generation prefix

mixed $idsource = ID_FROM_INC (line 52)

ID generation source

mixed $indexfields = array() (line 68)

Fields for indexing. This is an array of fieldname/value

mixed $ixid (line 49)

The index ID

mixed $metascan = true (line 55)

Scan for meta tags as fields in file content. Recommended.

mixed $meta_fields = array() (line 59)

Meta fields definitions array. Contains definitions

mixed $nice_msecs = 0 (line 81)

Number of milli-seconds to wait nicely between indexing calls.

mixed $port = "" (line 44)

Port to connect to

mixed $timeoutsecs = "" (line 78)

Timeout for indexing commands in seconds (can usually leave

mixed $timer (line 84)

Indexing execution timer

Methods
Constructor fileindexer (line 93)

Constructor

Create a new file indexer

fileindexer fileindexer ([string $application = "?"], [string $host = ""], [string $port = ""])
  • string $application: Application name
  • string $host: Hostname or IP of search engine server
  • string $port: Port of search engine server
define_field (line 114)

Define a field. We supply the name of the field, it's type (Text, Date or Id), and whether it should be stored by the search engine for later retreival in queries. For example you would not store the raw document/content as this is usually stored elsewhere.

IMPORTANT NOTE: Fields defined here will automatically be included as meta fields.

  • see: meta_fields()
void define_field (string $fieldname, string $type, [boolean $stored = STORED], [boolean $indexed = INDEXED])
  • string $fieldname: Name of the field to index
  • string $type: Type of field data: Text, Date or Id.
  • boolean $stored: If true then search engine will store the content itself
  • boolean $indexed: If true then search engine will index the field content
id_generate (line 171)

Set the source for ID generation. Since we are indexing a bunch of files, the ID's have to be generated on demand inside the loop. So we provide for various ways here, and you can extend this class to provide more if required.

Main ways: ID_FROM_INC Increment a counter by 1 each time (with offset) ID_FROM_NAME Take the filename, strip the extension, add prefix ID_FROM_FILENAME Take the full filename, add prefix ID_FROM_PATH Take the full file path NB: These are all defined as integer constants.

void id_generate ([integer $idsource = ID_FROM_INC], [mixed $pfxofs = ""])
  • integer $idsource: Source of ID generation
  • mixed $pfxofs: String prefix, or integer offset
index_field (line 153)

Supply field content for indexing. This causes the search engine to take the given fieldname and index the given value against it.

The field name can have the field type included in the form 'Foo:Date', where 'Date' is the type in this instance. In fact, since 'Text' is the default filed type, 'Date' is probably the only one you need to use as the current implementation stands.

void index_field (string $fieldname, string $fieldvalue)
  • string $fieldname: Name of the field to index.
  • string $fieldvalue: Content of the field to index
index_file (line 211)

Index a file located at the given path, using given ID.

You can also use the parameter $fields to supply an array of fieldname/value pairs to index with this file, for one-off indexing of files. If the fieldname is a date field, make sure to define the name as 'Foo:Date', to cause the field definition to be correct.

void index_file (string $path, string $id, [mixed $fields = false])
  • string $path: Path to the head of the file tree to index
  • string $id: ID to associate with the indexed file content
  • mixed $fields: Array of field/values to index with file
index_tree (line 350)

Index a tree of files starting at the path given. We index these in one of four modes, which determines how we generate the ID for each item: 'ID_FROM_INC' mode uses an incremental counter starting at 1. If $prefix holds a number, the counter will start at this number instead of one.

Each item has an ID incremented by one from the last one. 'ID_FROM_NAME' mode uses the filename, stripped of any path and extension as the ID. If prefix is not a nullstring, then it is prefixed to every filename ID. 'ID_FROM_FILENAME' mode uses the filename, including any extension as the ID. If prefix is not a nullstring, then it is prefixed to every filename ID. 'ID_FROM_PATH' mode uses the full path to the item being indexed as the ID. If prefix is not a nullstring, then it is prefixed to every filename ID. The file will simply be indexed as a single Text field, with the appropriate ID, and no other index fields unless $metascan is set to TRUE. If this is the case, the system will scan the file for HTML meta tags of form: '<meta name="foo" content="bar">'. In this example a field of name 'foo' would be given value 'bar'.

  • return: List of 3 counts: $done, $succeeded, $failed
array index_tree (string $path, [$patt $patt = ""], [$mode $mode = "dir"], integer $nice_msecs)
  • string $path: Path to the head of the file tree to index
  • integer $nice_msecs: Time to nicely wait between index calls
  • $patt $patt: Pattern to match, eg. '*.html'
  • $mode $mode: "file": read $path as file of paths, "dir": recurse $path as dir
meta_field (line 136)

Define a field as a meta tag. This ensures that the field will be picked up from the file meta tags, if present. If it is not listed here then it will be ignored.

IMPORTANT NOTE: We define the strict rule that ONLY fields which have been defined here can be added to the indexing via the meta tag scanning. Ie. you must define fields here explicitly, or via the define_field() method, or they will be ignored even if they turn up as a meta tag. This is so we can restrict the indexing, and be sure of field types.

void meta_field (string $fieldname, string $type, [boolean $stored = STORED], [boolean $indexed = INDEXED])
  • string $fieldname: Name of the field to process as meta tag
  • string $type: Type of field data: Text, Date or Id.
  • boolean $stored: If true then search engine will store the content itself
  • boolean $indexed: If true then search engine will index the field content
noscantags (line 197)

Flag that we should NOT do a tag scan on the content of the files.

void noscantags ()
scantags (line 190)

Flag that we should do a tag scan on the content of the files to try and extract fields to index. Note that any tags thus found will only be used if the field name has been defined with the method define_field(); This causes both the <title> tag and <meta> tags to be considered.

void scantags ()

Documentation generated by phpDocumentor 1.3.0RC3