Search Appliance SBE
Data is submitted to the Search Appliance with an HTTP POST request sent to
a similar URL as the admin interface (e.g. http://
.../dowalk
),
but with /recvdata.xml
appended. E.g.:
http://www.example.com/texis/dowalk/recvdata.xml
The following POST variables must be set in the request. Be sure to URL-encode the values:
profile
Set to the name of the receiving profile.data
Set to an XML document containing the data, and what to do with it
(insert/delete/etc.). See below for details.
Uploading content
Below is an example data
document where all fields are
specified. Be sure to HTML-encode values.
<?xml version="1.0" encoding="UTF-8"?>
<ThunderstoneReplication
xmlns:dt="urn:schemas-microsoft-com:datatypes"
>
<Item>
<Type>I</Type>
<Size>150369</Size>
<Visited>2005-10-25 15:25:18</Visited>
<Dlsecs>0</Dlsecs>
<Depth>0</Depth>
<Url>http://www.example.com/dir/page.html</Url>
<Title>Sprocket Specifications</Title>
<Body>...</Body>
<Keywords>sprockets, gears, hubs</Keywords>
<Description>Sprocket details</Description>
<Meta></Meta>
<Category>Mechanical</Category>
<Modified>2005-10-25 11:21:07</Modified>
<NextCheck>2005-10-25 16:25:18</NextCheck>
<Views>0</Views>
<Clicks>0</Clicks>
<CTR>0.000000</CTR>
<Pop>0</Pop>
<MimeType>text/html</MimeType>
<Charset>UTF-8</Charset>
<Refs dt:dt="bin.base64">...</Refs>
<Errors dt:dt="bin.base64">...</Errors>
<RawData dt:dt="bin.base64"></RawData>
</Item>
</ThunderstoneReplication>
Any element whose text data might not be XML-safe (e.g. binary chars in
the <Body>
) should be base64-encoded, and the attribute
dt:dt="bin.base64"
set in the tag. E.g. the <Refs>
and
<Errors>
elements' text data are always base64-encoded. Note
that the XML namespace prefix dt
should also then be set to
urn:schemas-microsoft-com:datatypes in the root
<ThunderstoneReplication>
element.
The elements are:
<Type>
The action to take with this data. Text value may be one of:
I
- Insert the data (overwrite all previous data for URL, if any)D
- Delete the URLDP
- Delete the URL as a pattern (e.g.
http://www.example.com/dir/*
)U
- Update the URL, leaving unspecified fields unchangedUI
- Update search indexes (call after a batch of
inserts/deletes)
<Size>
The integer size of the original document.<Visited>
When the document was fetched, in YYYY-MM-DD HH:MM:SS format.<Dlsecs>
Number of seconds taken to download the document.<Depth>
Depth of URL from a Base URL, e.g. 0 is a Base URL, 1 is one click away,
etc.<Url>
The URL of the document.<Title>
The title of the document.<Body>
The formatted body of the document.<Keywords>
Any keywords for the document.<Description>
The description of the document.<Meta>
Any meta data for the document.<Category>
The category the document is in, if any. Must be a category name
from the profile's Categories.<Modified>
The Last-Modified date of the document
in YYYY-MM-DD HH:MM:SS format.<NextCheck>
When the document should be refreshed,
in YYYY-MM-DD HH:MM:SS format.<Views>
Number of views of the document: how many times it has been shown in
search results.<Clicks>
Number of clicks of the document: how many times it has been clicked on
in search results.<CTR>
Click-through-ratio: floating-point number ratio of clicks to views.<Pop>
Document popularity: number of references (links) to it.<MimeType>
The MIME type of the content served at the URL, or provided in RawData.<Charset>
Character set of <Body>
data. Should correspond with Storage Charset profile setting (here).
If a charset other than the Storage Charset is used, it
should be a standard IANA charset that the Search Appliance can convert
to the Storage Charset.<Refs>
Optional element with references (child links) of the document.<Errors>
Optional element with errors of the document.