Man page of tw_learn(3) and tw_learn_file(3)

Dieses Dokument ist nur in englischer Sprache verfügbar.

Index


NAME

tw_learn, tw_learn_file - learn characteristics of a category

SYNOPSIS

C/C++ #include <tw.h>

 tw_errno_t tw_learn(tw_t *tw, const char *cat, const char *str);

 tw_errno_t tw_learn_file(tw_t *tw, const char *cat, const char *path);

DESCRIPTION

tw_learn() and tw_learn_file() analyse a document's content and learn how to assign similar documents to the very category and to its top-level categories.

tw_learn() processes strings while tw_learn_file() handles documents stored within the file system.

PARAMETERS

Both tw_learn() and tw_learn_file() have the first two parameters in common:

tw (tw_t *)

Pointer to an initialized Textweiser object.

cat (const char *)

Name of the category the document is an example of.

tw_learn() expects as a third parameter:

str (const char *)

The document's content as a string.

tw_learn_file() expects as a third parameter:

path (const char *)

The path to the document within the file system.

RETURN VALUE

Both tw_learn() and tw_learn_file() return an error indicator (tw_errno_t). A return value of TW_OK indicates success, any other value discriminates the occurred error.

The function tw_strerror(3) can be used to obtain a natural language error message.

NOTES

o

Both functions require the input to be plain text and should be in a supported language - see Textweiser's User Manual for details.

o

Both functions require the input to be encoded in UTF-8. If the document is encoded in a different encoding, TW_ENOSUTF will be returned as an error code ("Not a supported Unicode Transformation Format").

o

In order to learn a document as an example of a category, the category has to be created in advance using tw_add_category() or tw-admin(1).

o

It is recommended to train each category by learning from at least ten appropriate documents. When learning is completed, a database optimization using either tw-admin(1) or tw_optimize_db(3) may be utilized to speed up classification tasks.

SEE ALSO

tw-learn(1), tw-admin(1)

tw_add_category(3), tw_free(3), tw_strerror(3), tw_optimize_db(3)

Textweiser User Manual

http://www.lingua-systems.com/text-classifier/textweiser-library/