Discussion:
[MarkLogic Dev General] Difference between eNode and Data Node
Saptarshi Newyork
2009-03-23 11:30:15 UTC
Permalink
Hi ,
I have a few questions:
 
1)  What is the difference between eNode and dNode? I have read that E-nodes are required to evaluate XQuery programs, XCC/XDBC requests, WebDAV requests, and other server requests.and dNodes are those which directly talks with the database/forest. It is also told that if the request does not need any forest data to complete, then an e-node request is evaluated entirely on the e-node. I do not understand how this is possible!! If eNode is meant for XQuery evaluation and XQuery needs an XML to process, then every eNode request should talk to dNode. Is there any caching mechanism? It will be great if anybody can explain this to me?
 
2) There are two failover mechanism explained in the documentation. Forest level failover and eNode level failover. It seems that forest data level failover is not handled by Marklogic. Like if the filesystem crashes, is there anyway by which Marklogic server replicates the forest to other hosts in same or different cluster? If this feature is not presently supported, then when can we expect this on the roadmap?
 
Thanks in advance.
 
regards,
Saptarshi
Eric Palmitesta
2009-03-23 14:35:53 UTC
Permalink
To answer your first question, the following xquery script won't touch
your D-Node(s):

xquery version "1.0-ml";
<p>the time is now { fn:current-dateTime() }</p>
<p>this is running on a { xdmp:platform() } platform</p>

I'm not an authority on the matter, so confirmation would be nice.

Cheers,

Eric
Hi ,
1) What is the *difference between eNode and dNode*? I have read that
E-nodes are required to evaluate XQuery programs, XCC/XDBC requests,
WebDAV requests, and other server requests.and dNodes are those which
directly talks with the database/forest. It is also told that if the
request does not need any forest data to complete, then an e-node
request is evaluated entirely on the e-node. I do not understand how
this is possible!! If eNode is meant for XQuery evaluation and XQuery
needs an XML to process, then every eNode request should talk to dNode.
Is there any caching mechanism? It will be great if anybody can explain
this to me?
2) There are two failover mechanism explained in the documentation.
Forest level failover and eNode level failover. It seems that forest
data level failover is not handled by Marklogic. Like if the filesystem
crashes, is there anyway by which Marklogic server *replicates the
forest to other hosts in same or different cluster*? If this feature is
not presently supported, then when can we expect this on the roadmap?
Thanks in advance.
regards,
Saptarshi
------------------------------------------------------------------------
_______________________________________________
General mailing list
http://xqzone.com/mailman/listinfo/general
Geert Josten
2009-03-23 19:58:30 UTC
Permalink
Saptarshi,

I am not an authority on this matter either, but I will try to explain as well as possible..

1) MarkLogic Server is designed to operate with evaluator nodes and database nodes. The database nodes access content stored in forests and perform search queries over the forests. The evaluator nodes are responsible for executing the Xquery code, webdav requests, XDBC calls etc. If the involved code to be executed doesn't access any content stored in the database (no cts:search calls, no doc statements, etc), but purely relies on in memory constructed content, then database nodes are not accessed. It has nothing to do with caching of any kind, it is just that content can be constructed on the fly, by just incorporating it in the Xquery script for instance. The example Eric supplied is valid.

2) MarkLogic Server does not handle failover when filesystems crash. The documentation (http://developer.marklogic.com/pubs/4.0/books/cluster.pdf) explains that filesystem crashes should be handled by using a clustered filesystem. There are some suggestions in that document, but I can imagine that a RAID configuration might suffice for simples situations as well. Forest-level failover works as follows: you assign multiple hosts to one physically shared forest. These hosts are listed in order. If the 1st host drops out, the 2nd host takes that forest over. Replication of data is not necessary that way, making it more efficient and much more scalable. At the front-end you have also the HTTP servers etc on the hosts. You can have as many as you like. By putting a hardware or software load-balancer in front you can distribute calls coming in at a single port to all available 'evaluator' nodes. Load-balancing is not handled by MarkLogic Server itself, there are plenty solutions readily available so why bother. ;-)

I am not sure whether an HTTP server is the actual evaluator node, but I don't think so. There is this Task Server configuration page within the MarkLogic Server Group Administration. This configures Task threads on all hosts within a single group. I have the impression these act as evaluator nodes and the Databases in the MarkLogic Server Administration correspond to the database nodes. Forest-level failover is configured at the Forest configuration pages.

I hope this makes things clearer to you!

Kind regards,
Geert
Drs. G.P.H. Josten
Consultant


http://www.daidalos.nl/
Daidalos BV
Source of Innovation
Hoekeindsehof 1-4
2665 JZ Bleiswijk
Tel.: +31 (0) 10 850 1200
Fax: +31 (0) 10 850 1199
http://www.daidalos.nl/
KvK 27164984
De informatie - verzonden in of met dit emailbericht - is afkomstig van Daidalos BV en is uitsluitend bestemd voor de geadresseerde. Indien u dit bericht onbedoeld hebt ontvangen, verzoeken wij u het te verwijderen. Aan dit bericht kunnen geen rechten worden ontleend.
Saptarshi Newyork
Sent: maandag 23 maart 2009 12:30
Subject: [MarkLogic Dev General] Difference between eNode and
Data Node
Hi ,
1) What is the difference between eNode and dNode? I have
read that E-nodes are required to evaluate XQuery programs,
XCC/XDBC requests, WebDAV requests, and other server
requests.and dNodes are those which directly talks with the
database/forest. It is also told that if the request does not
need any forest data to complete, then an e-node request is
evaluated entirely on the e-node. I do not understand how
this is possible!! If eNode is meant for XQuery evaluation
and XQuery needs an XML to process, then every eNode request
should talk to dNode. Is there any caching mechanism? It will
be great if anybody can explain this to me?
2) There are two failover mechanism explained in the
documentation. Forest level failover and eNode level
failover. It seems that forest data level failover is not
handled by Marklogic. Like if the filesystem crashes, is
there anyway by which Marklogic server replicates the forest
to other hosts in same or different cluster? If this feature
is not presently supported, then when can we expect this on
the roadmap?
Thanks in advance.
regards,
Saptarshi
Danny Sokolsky
2009-03-23 20:18:19 UTC
Permalink
Hi Geert,

Thanks for the great description. I will just add one thing to what you
said:

Whether a host acts as an e-node or a d-node depends on what it is doing
at the time, and a given host in a MarkLogic cluster can behave as an
e-node, a d-node, or both. For example, if you have a single host
instance of MarkLogic Server, that host acts as both the e-node (to
evaluate XQuery) and as the d-node (to perform forest operations on
content).

-Danny

-----Original Message-----
From: general-***@developer.marklogic.com
[mailto:general-***@developer.marklogic.com] On Behalf Of Geert
Josten
Sent: Monday, March 23, 2009 12:59 PM
To: General Mark Logic Developer Discussion
Subject: RE: [MarkLogic Dev General] Difference between eNode and Data
Node

Saptarshi,

I am not an authority on this matter either, but I will try to explain
as well as possible..

1) MarkLogic Server is designed to operate with evaluator nodes and
database nodes. The database nodes access content stored in forests and
perform search queries over the forests. The evaluator nodes are
responsible for executing the Xquery code, webdav requests, XDBC calls
etc. If the involved code to be executed doesn't access any content
stored in the database (no cts:search calls, no doc statements, etc),
but purely relies on in memory constructed content, then database nodes
are not accessed. It has nothing to do with caching of any kind, it is
just that content can be constructed on the fly, by just incorporating
it in the Xquery script for instance. The example Eric supplied is
valid.

2) MarkLogic Server does not handle failover when filesystems crash. The
documentation
(http://developer.marklogic.com/pubs/4.0/books/cluster.pdf) explains
that filesystem crashes should be handled by using a clustered
filesystem. There are some suggestions in that document, but I can
imagine that a RAID configuration might suffice for simples situations
as well. Forest-level failover works as follows: you assign multiple
hosts to one physically shared forest. These hosts are listed in order.
If the 1st host drops out, the 2nd host takes that forest over.
Replication of data is not necessary that way, making it more efficient
and much more scalable. At the front-end you have also the HTTP servers
etc on the hosts. You can have as many as you like. By putting a
hardware or software load-balancer in front you can distribute calls
coming in at a single port to all available 'evaluator' nodes.
Load-balancing is not handled by MarkLogic Server itself, there are
plenty solutions readily available so why bother. ;-)

I am not sure whether an HTTP server is the actual evaluator node, but I
don't think so. There is this Task Server configuration page within the
MarkLogic Server Group Administration. This configures Task threads on
all hosts within a single group. I have the impression these act as
evaluator nodes and the Databases in the MarkLogic Server Administration
correspond to the database nodes. Forest-level failover is configured at
the Forest configuration pages.

I hope this makes things clearer to you!

Kind regards,
Geert
Drs. G.P.H. Josten
Consultant


http://www.daidalos.nl/
Daidalos BV
Source of Innovation
Hoekeindsehof 1-4
2665 JZ Bleiswijk
Tel.: +31 (0) 10 850 1200
Fax: +31 (0) 10 850 1199
http://www.daidalos.nl/
KvK 27164984
De informatie - verzonden in of met dit emailbericht - is afkomstig van
Daidalos BV en is uitsluitend bestemd voor de geadresseerde. Indien u
dit bericht onbedoeld hebt ontvangen, verzoeken wij u het te
verwijderen. Aan dit bericht kunnen geen rechten worden ontleend.
Saptarshi Newyork
Sent: maandag 23 maart 2009 12:30
Subject: [MarkLogic Dev General] Difference between eNode and
Data Node
Hi ,
1) What is the difference between eNode and dNode? I have
read that E-nodes are required to evaluate XQuery programs,
XCC/XDBC requests, WebDAV requests, and other server
requests.and dNodes are those which directly talks with the
database/forest. It is also told that if the request does not
need any forest data to complete, then an e-node request is
evaluated entirely on the e-node. I do not understand how
this is possible!! If eNode is meant for XQuery evaluation
and XQuery needs an XML to process, then every eNode request
should talk to dNode. Is there any caching mechanism? It will
be great if anybody can explain this to me?
2) There are two failover mechanism explained in the
documentation. Forest level failover and eNode level
failover. It seems that forest data level failover is not
handled by Marklogic. Like if the filesystem crashes, is
there anyway by which Marklogic server replicates the forest
to other hosts in same or different cluster? If this feature
is not presently supported, then when can we expect this on
the roadmap?
Thanks in advance.
regards,
Saptarshi
Saptarshi Newyork
2009-03-24 05:08:06 UTC
Permalink
Hi All,
 Thanks for a great description and examples. I still have couple of questions to add:
 
1) I understand that same node can work as both eNode and dNode, bu if I want to have separate eNode and dNode, in that case, is there any difference in configuration of the host for these two nodes?
 
2) In an architecture where both eNode and dNode exist, suppose a request comes to eNode which requires an access to forest. Then it's written that eNode will send the request to dNode to access the forest. But every evaluator node(eNode) is also attached to some forests. How this transfer of request is achieved? How eNode can make a call to dNode?  Is there any configuration or coding required to achieve this? Can under any scenario eNode access its own forest?
 
Thanks in advance.
regards,
Saptarshi

--- On Mon, 3/23/09, Danny Sokolsky <***@marklogic.com> wrote:


From: Danny Sokolsky <***@marklogic.com>
Subject: RE: [MarkLogic Dev General] Difference between eNode and Data Node
To: "General Mark Logic Developer Discussion" <***@developer.marklogic.com>
Date: Monday, March 23, 2009, 4:18 PM


Hi Geert,

Thanks for the great description.  I will just add one thing to what you
said:

Whether a host acts as an e-node or a d-node depends on what it is doing
at the time, and a given host in a MarkLogic cluster can behave as an
e-node, a d-node, or both.  For example, if you have a single host
instance of MarkLogic Server, that host acts as both the e-node (to
evaluate XQuery) and as the d-node (to perform forest operations on
content). 

-Danny

-----Original Message-----
From: general-***@developer.marklogic.com
[mailto:general-***@developer.marklogic.com] On Behalf Of Geert
Josten
Sent: Monday, March 23, 2009 12:59 PM
To: General Mark Logic Developer Discussion
Subject: RE: [MarkLogic Dev General] Difference between eNode and Data
Node

Saptarshi,

I am not an authority on this matter either, but I will try to explain
as well as possible..

1) MarkLogic Server is designed to operate with evaluator nodes and
database nodes. The database nodes access content stored in forests and
perform search queries over the forests. The evaluator nodes are
responsible for executing the Xquery code, webdav requests, XDBC calls
etc. If the involved code to be executed doesn't access any content
stored in the database (no cts:search calls, no doc statements, etc),
but purely relies on in memory constructed content, then database nodes
are not accessed. It has nothing to do with caching of any kind, it is
just that content can be constructed on the fly, by just incorporating
it in the Xquery script for instance. The example Eric supplied is
valid.

2) MarkLogic Server does not handle failover when filesystems crash. The
documentation
(http://developer.marklogic.com/pubs/4.0/books/cluster.pdf) explains
that filesystem crashes should be handled by using a clustered
filesystem. There are some suggestions in that document, but I can
imagine that a RAID configuration might suffice for simples situations
as well. Forest-level failover works as follows: you assign multiple
hosts to one physically shared forest. These hosts are listed in order.
If the 1st host drops out, the 2nd host takes that forest over.
Replication of data is not necessary that way, making it more efficient
and much more scalable. At the front-end you have also the HTTP servers
etc on the hosts. You can have as many as you like. By putting a
hardware or software load-balancer in front you can distribute calls
coming in at a single port to all available 'evaluator' nodes.
Load-balancing is not handled by MarkLogic Server itself, there are
plenty solutions readily available so why bother. ;-)

I am not sure whether an HTTP server is the actual evaluator node, but I
don't think so. There is this Task Server configuration page within the
MarkLogic Server Group Administration. This configures Task threads on
all hosts within a single group. I have the impression these act as
evaluator nodes and the Databases in the MarkLogic Server Administration
correspond to the database nodes. Forest-level failover is configured at
the Forest configuration pages.

I hope this makes things clearer to you!

Kind regards,
Geert
Drs. G.P.H. Josten
Consultant


http://www.daidalos.nl/
Daidalos BV
Source of Innovation
Hoekeindsehof 1-4
2665 JZ Bleiswijk
Tel.: +31 (0) 10 850 1200
Fax: +31 (0) 10 850 1199
http://www.daidalos.nl/
KvK 27164984
De informatie - verzonden in of met dit emailbericht - is afkomstig van
Daidalos BV en is uitsluitend bestemd voor de geadresseerde. Indien u
dit bericht onbedoeld hebt ontvangen, verzoeken wij u het te
verwijderen. Aan dit bericht kunnen geen rechten worden ontleend.
Saptarshi Newyork
Sent: maandag 23 maart 2009 12:30
Subject: [MarkLogic Dev General] Difference between eNode and
Data Node
Hi ,
1)  What is the difference between eNode and dNode? I have
read that E-nodes are required to evaluate XQuery programs,
XCC/XDBC requests, WebDAV requests, and other server
requests.and dNodes are those which directly talks with the
database/forest. It is also told that if the request does not
need any forest data to complete, then an e-node request is
evaluated entirely on the e-node. I do not understand how
this is possible!! If eNode is meant for XQuery evaluation
and XQuery needs an XML to process, then every eNode request
should talk to dNode. Is there any caching mechanism? It will
be great if anybody can explain this to me?
2) There are two failover mechanism explained in the
documentation. Forest level failover and eNode level
failover. It seems that forest data level failover is not
handled by Marklogic. Like if the filesystem crashes, is
there anyway by which Marklogic server replicates the forest
to other hosts in same or different cluster? If this feature
is not presently supported, then when can we expect this on
the roadmap?
Thanks in advance.
regards,
Saptarshi
Danny Sokolsky
2009-03-24 16:52:12 UTC
Permalink
Hi Saptarshi,



There is no requirement that an e-node must also have a forest attached
to it. In fact, in large implementations, the norm is to configure
e-nodes to do only e-node work and d-nodes to do only d-node work. That
is what Groups are for. You might, for example, set up 2 groups, one
for e-nodes and one for d-nodes. The d-node groups do not need to have
any app servers on them, and the e-node groups do not need any databases
or forests. This means that each node can devote its entire life (and
all of its resources) to its role. For example, if you have a group
that only has d-nodes, you do not need to allocate much expanded tree
cache (that is used for e-node processing). Similarly, if a group is
only e-nodes, they do not need to allocate much list cache or compressed
tree cache. Be extra careful when changing these values, however, and
make sure you know what role your hosts are playing.



Hosts in a MarkLogic Server cluster communicate via the xdqp protocol,
which is an internal communication mechanism. Any changes to the
cluster are communicated to the other hosts via xdqp, and forest data is
transferred to the e-node via xdqp. All of this communication happens
automatically.



Hope that helps,

-Danny



From: general-***@developer.marklogic.com
[mailto:general-***@developer.marklogic.com] On Behalf Of Saptarshi
Newyork
Sent: Monday, March 23, 2009 10:08 PM
To: General Mark Logic Developer Discussion
Subject: RE: [MarkLogic Dev General] Difference between eNode and Data
Node



Hi All,

Thanks for a great description and examples. I still have couple of
questions to add:



1) I understand that same node can work as both eNode and dNode, bu if I
want to have separate eNode and dNode, in that case, is there any
difference in configuration of the host for these two nodes?



2) In an architecture where both eNode and dNode exist, suppose a
request comes to eNode which requires an access to forest. Then it's
written that eNode will send the request to dNode to access the forest.
But every evaluator node(eNode) is also attached to some forests. How
this transfer of request is achieved? How eNode can make a call to
dNode? Is there any configuration or coding required to achieve this?
Can under any scenario eNode access its own forest?



Thanks in advance.

regards,

Saptarshi

--- On Mon, 3/23/09, Danny Sokolsky <***@marklogic.com> wrote:


From: Danny Sokolsky <***@marklogic.com>
Subject: RE: [MarkLogic Dev General] Difference between eNode
and Data Node
To: "General Mark Logic Developer Discussion"
<***@developer.marklogic.com>
Date: Monday, March 23, 2009, 4:18 PM

Hi Geert,

Thanks for the great description. I will just add one thing to
what you
said:

Whether a host acts as an e-node or a d-node depends on what it
is doing
at the time, and a given host in a MarkLogic cluster can behave
as an
e-node, a d-node, or both. For example, if you have a single
host
instance of MarkLogic Server, that host acts as both the e-node
(to
evaluate XQuery) and as the d-node (to perform forest operations
on
content).

-Danny

-----Original Message-----
From: general-***@developer.marklogic.com
<http://us.mc588.mail.yahoo.com/mc/compose?to=general-***@developer.
marklogic.com>
[mailto:general-***@developer.marklogic.com
<http://us.mc588.mail.yahoo.com/mc/compose?to=general-***@developer.
marklogic.com> ] On Behalf Of Geert
Josten
Sent: Monday, March 23, 2009 12:59 PM
To: General Mark Logic Developer Discussion
Subject: RE: [MarkLogic Dev General] Difference between eNode
and Data
Node

Saptarshi,

I am not an authority on this matter either, but I will try to
explain
as well as possible..

1) MarkLogic Server is designed to operate with evaluator nodes
and
database nodes. The database nodes access content stored in
forests and
perform search queries over the forests. The evaluator nodes are
responsible for executing the Xquery code, webdav requests, XDBC
calls
etc. If the involved code to be executed doesn't access any
content
stored in the database (no cts:search calls, no doc statements,
etc),
but purely relies on in memory constructed content, then
database nodes
are not accessed. It has nothing to do with caching of any kind,
it is
just that content can be constructed on the fly, by just
incorporating
it in the Xquery script for instance. The example Eric supplied
is
valid.

2) MarkLogic Server does not handle failover when filesystems
crash. The
documentation
(http://developer.marklogic.com/pubs/4.0/books/cluster.pdf)
explains
that filesystem crashes should be handled by using a clustered
filesystem. There are some suggestions in that document, but I
can
imagine that a RAID configuration might suffice for simples
situations
as well. Forest-level failover works as follows: you assign
multiple
hosts to one physically shared forest. These hosts are listed in
order.
If the 1st host drops out, the 2nd host takes that forest over.
Replication of data is not necessary that way, making it more
efficient
and much more scalable. At the front-end you have also the HTTP
servers
etc on the hosts. You can have as many as you like. By putting a
hardware or software load-balancer in front you can distribute
calls
coming in at a single port to all available 'evaluator' nodes.
Load-balancing is not handled by MarkLogic Server itself, there
are
plenty solutions readily available so why bother. ;-)

I am not sure whether an HTTP server is the actual evaluator
node, but I
don't think so. There is this Task Server configuration page
within the
MarkLogic Server Group Administration. This configures Task
threads on
all hosts within a single group. I have the impression these act
as
evaluator nodes and the Databases in the MarkLogic Server
Administration
correspond to the database nodes. Forest-level failover is
configured at
the Forest configuration pages.

I hope this makes things clearer to you!

Kind regards,
Geert
Drs. G.P.H. Josten
Consultant


http://www.daidalos.nl/
Daidalos BV
Source of Innovation
Hoekeindsehof 1-4
2665 JZ Bleiswijk
Tel.: +31 (0) 10 850 1200
Fax: +31 (0) 10 850 1199
http://www.daidalos.nl/
KvK 27164984
De informatie - verzonden in of met dit emailbericht - is
afkomstig van
Daidalos BV en is uitsluitend bestemd voor de geadresseerde.
Indien u
dit bericht onbedoeld hebt ontvangen, verzoeken wij u het te
verwijderen. Aan dit bericht kunnen geen rechten worden
ontleend.
<http://us.mc588.mail.yahoo.com/mc/compose?to=general-***@developer.
marklogic.com>
<http://us.mc588.mail.yahoo.com/mc/compose?to=general-***@developer.
marklogic.com> ] On Behalf Of
Saptarshi Newyork
Sent: maandag 23 maart 2009 12:30
<http://us.mc588.mail.yahoo.com/mc/compose?to=***@developer.marklogi
c.com>
Subject: [MarkLogic Dev General] Difference between eNode and
Data Node
Hi ,
1) What is the difference between eNode and dNode? I have
read that E-nodes are required to evaluate XQuery programs,
XCC/XDBC requests, WebDAV requests, and other server
requests.and dNodes are those which directly talks with the
database/forest. It is also told that if the request does not
need any forest data to complete, then an e-node request is
evaluated entirely on the e-node. I do not understand how
this is possible!! If eNode is meant for XQuery evaluation
and XQuery needs an XML to process, then every eNode request
should talk to dNode. Is there any caching mechanism? It will
be great if anybody can explain this to me?
2) There are two failover mechanism explained in the
documentation. Forest level failover and eNode level
failover. It seems that forest data level failover is not
handled by Marklogic. Like if the filesystem crashes, is
there anyway by which Marklogic server replicates the forest
to other hosts in same or different cluster? If this feature
is not presently supported, then when can we expect this on
the roadmap?
Thanks in advance.
regards,
Saptarshi
_______________________________________________
General mailing list
***@developer.marklogic.com
<http://us.mc588.mail.yahoo.com/mc/compose?to=***@developer.marklogi
c.com>
http://xqzone.com/mailman/listinfo/general
_______________________________________________
General mailing list
***@developer.marklogic.com
<http://us.mc588.mail.yahoo.com/mc/compose?to=***@developer.marklogi
c.com>
http://xqzone.com/mailman/listinfo/general
Geert Josten
2009-03-24 20:05:35 UTC
Permalink
Hi Danny,

How are Databases in the d-group accessed from the e-nodes in the e-group? Is it sufficient that both groups are defined in the same cluster? Databases are configured outside the Group configuration. Is a d-group created by only assinging hosts connected to a forest, and assigning all other hosts to the e-group which contains the needed app servers?

It would be helpful if you could give a more explict example. Is that possible without writing a lenghty email?

Kind regards,
Geert
Post by Danny Sokolsky
-----Original Message-----
Danny Sokolsky
Sent: dinsdag 24 maart 2009 17:52
To: General Mark Logic Developer Discussion
Subject: RE: [MarkLogic Dev General] Difference between eNode
and Data Node
Hi Saptarshi,
There is no requirement that an e-node must also have a
forest attached to it. In fact, in large implementations,
the norm is to configure e-nodes to do only e-node work and
d-nodes to do only d-node work. That is what Groups are for.
You might, for example, set up 2 groups, one for e-nodes and
one for d-nodes. The d-node groups do not need to have any
app servers on them, and the e-node groups do not need any
databases or forests. This means that each node can devote
its entire life (and all of its resources) to its role. For
example, if you have a group that only has d-nodes, you do
not need to allocate much expanded tree cache (that is used
for e-node processing). Similarly, if a group is only
e-nodes, they do not need to allocate much list cache or
compressed tree cache. Be extra careful when changing these
values, however, and make sure you know what role your hosts
are playing.
Hosts in a MarkLogic Server cluster communicate via the xdqp
protocol, which is an internal communication mechanism. Any
changes to the cluster are communicated to the other hosts
via xdqp, and forest data is transferred to the e-node via
xdqp. All of this communication happens automatically.
Hope that helps,
-Danny
Saptarshi Newyork
Sent: Monday, March 23, 2009 10:08 PM
To: General Mark Logic Developer Discussion
Subject: RE: [MarkLogic Dev General] Difference between eNode
and Data Node
Hi All,
Thanks for a great description and examples. I still have
1) I understand that same node can work as both eNode and
dNode, bu if I want to have separate eNode and dNode, in that
case, is there any difference in configuration of the host
for these two nodes?
2) In an architecture where both eNode and dNode exist,
suppose a request comes to eNode which requires an access to
forest. Then it's written that eNode will send the request to
dNode to access the forest. But every evaluator node(eNode)
is also attached to some forests. How this transfer of
request is achieved? How eNode can make a call to dNode? Is
there any configuration or coding required to achieve this?
Can under any scenario eNode access its own forest?
Thanks in advance.
regards,
Saptarshi
Subject: RE: [MarkLogic Dev General] Difference between
eNode and Data Node
To: "General Mark Logic Developer Discussion"
Date: Monday, March 23, 2009, 4:18 PM
Hi Geert,
Thanks for the great description. I will just add one
thing to what you
Whether a host acts as an e-node or a d-node depends on
what it is doing
at the time, and a given host in a MarkLogic cluster
can behave as an
e-node, a d-node, or both. For example, if you have a
single host
instance of MarkLogic Server, that host acts as both
the e-node (to
evaluate XQuery) and as the d-node (to perform forest
operations on
content).
-Danny
-----Original Message-----
developer.marklogic.com>
developer.marklogic.com> ] On Behalf Of Geert
Josten
Sent: Monday, March 23, 2009 12:59 PM
To: General Mark Logic Developer Discussion
Subject: RE: [MarkLogic Dev General] Difference between
eNode and Data
Node
Saptarshi,
I am not an authority on this matter either, but I will
try to explain
as well as possible..
1) MarkLogic Server is designed to operate with
evaluator nodes and
database nodes. The database nodes access content
stored in forests and
perform search queries over the forests. The evaluator nodes are
responsible for executing the Xquery code, webdav
requests, XDBC calls
etc. If the involved code to be executed doesn't access
any content
stored in the database (no cts:search calls, no doc
statements, etc),
but purely relies on in memory constructed content,
then database nodes
are not accessed. It has nothing to do with caching of
any kind, it is
just that content can be constructed on the fly, by
just incorporating
it in the Xquery script for instance. The example Eric
supplied is
valid.
2) MarkLogic Server does not handle failover when
filesystems crash. The
documentation
(http://developer.marklogic.com/pubs/4.0/books/cluster.pdf) explains
that filesystem crashes should be handled by using a clustered
filesystem. There are some suggestions in that
document, but I can
imagine that a RAID configuration might suffice for
simples situations
as well. Forest-level failover works as follows: you
assign multiple
hosts to one physically shared forest. These hosts are
listed in order.
If the 1st host drops out, the 2nd host takes that forest over.
Replication of data is not necessary that way, making
it more efficient
and much more scalable. At the front-end you have also
the HTTP servers
etc on the hosts. You can have as many as you like. By putting a
hardware or software load-balancer in front you can
distribute calls
coming in at a single port to all available 'evaluator' nodes.
Load-balancing is not handled by MarkLogic Server
itself, there are
plenty solutions readily available so why bother. ;-)
I am not sure whether an HTTP server is the actual
evaluator node, but I
don't think so. There is this Task Server configuration
page within the
MarkLogic Server Group Administration. This configures
Task threads on
all hosts within a single group. I have the impression
these act as
evaluator nodes and the Databases in the MarkLogic
Server Administration
correspond to the database nodes. Forest-level failover
is configured at
the Forest configuration pages.
I hope this makes things clearer to you!
Kind regards,
Geert
Drs. G.P.H. Josten
Consultant
http://www.daidalos.nl/
Daidalos BV
Source of Innovation
Hoekeindsehof 1-4
2665 JZ Bleiswijk
Tel.: +31 (0) 10 850 1200
Fax: +31 (0) 10 850 1199
http://www.daidalos.nl/
KvK 27164984
De informatie - verzonden in of met dit emailbericht -
is afkomstig van
Daidalos BV en is uitsluitend bestemd voor de
geadresseerde. Indien u
dit bericht onbedoeld hebt ontvangen, verzoeken wij u het te
verwijderen. Aan dit bericht kunnen geen rechten worden
ontleend.
developer.marklogic.com>
developer.marklogic.com> ] On Behalf Of
Saptarshi Newyork
Sent: maandag 23 maart 2009 12:30
r.marklogic.com>
Subject: [MarkLogic Dev General] Difference between eNode and
Data Node
Hi ,
1) What is the difference between eNode and dNode? I have
read that E-nodes are required to evaluate XQuery programs,
XCC/XDBC requests, WebDAV requests, and other server
requests.and dNodes are those which directly talks with the
database/forest. It is also told that if the request does not
need any forest data to complete, then an e-node request is
evaluated entirely on the e-node. I do not understand how
this is possible!! If eNode is meant for XQuery evaluation
and XQuery needs an XML to process, then every eNode request
should talk to dNode. Is there any caching mechanism? It will
be great if anybody can explain this to me?
2) There are two failover mechanism explained in the
documentation. Forest level failover and eNode level
failover. It seems that forest data level failover is not
handled by Marklogic. Like if the filesystem crashes, is
there anyway by which Marklogic server replicates the forest
to other hosts in same or different cluster? If this feature
is not presently supported, then when can we expect this on
the roadmap?
Thanks in advance.
regards,
Saptarshi
_______________________________________________
General mailing list
r.marklogic.com>
http://xqzone.com/mailman/listinfo/general
_______________________________________________
General mailing list
r.marklogic.com>
http://xqzone.com/mailman/listinfo/general
Danny Sokolsky
2009-03-24 20:43:43 UTC
Permalink
Yes, any e-node host in the cluster can access any database in the
cluster. Think about the d-nodes as hosting forests. The database is
an abstraction on top of some number of forests. The e-nodes host
interfaces into the databases, which are App Servers (HTTP, XDBC, and
WebDAV).

As an example, consider a 3-node cluster (this is a bit simplified):

host1: e-node group, has an HTTP server on port 80 talking to database
d1. It has no forest assignments.

host2: d-node group, has forest f1, which is attached to database d1.

host3: d-node group, has forest Security, which is the forest for the
security db for the cluster.

In order to access content in database d1, you must come in through the
App Server on host1. A request such as doc("/foo.xml") is processed
(roughly) as follows:

* an http request to run the code doc("/foo.xml") is submitted to the
HTTP server on host1 (for example, there might be an XQuery file names
doc.xqy under the app server root with the code that is accessed by an
http request: http://host1/doc.xqy).
* host1 (the evaluator node) gets the HTTP request, parses the XQuery.
* this request requires forest data, so host1 asks the forests for this
database (only one in this example) for the data needed (the document
/foo.xml in this example). This communication happens over xdqp.
* host2 responds by sending the forest data for /foo.xml back to host1
(via xdqp).
* host1 then processes the result and returns it back to the client.

As I said, this is a bit simplified. For example, the host that is
hosting the Security forest also gets involved, but its involvement is a
little more complex and is not really important for understanding how
the e-node / d-node communication takes place. Suffice it to say that
every request is authenticated and a user can only see and do that which
he is authorized.

There are other simplifications too. But hopefully that gives you a
better idea of how it works.

-Danny


-----Original Message-----
From: general-***@developer.marklogic.com
[mailto:general-***@developer.marklogic.com] On Behalf Of Geert
Josten
Sent: Tuesday, March 24, 2009 1:06 PM
To: General Mark Logic Developer Discussion
Subject: RE: [MarkLogic Dev General] Difference between eNode and Data
Node

Hi Danny,

How are Databases in the d-group accessed from the e-nodes in the
e-group? Is it sufficient that both groups are defined in the same
cluster? Databases are configured outside the Group configuration. Is a
d-group created by only assinging hosts connected to a forest, and
assigning all other hosts to the e-group which contains the needed app
servers?

It would be helpful if you could give a more explict example. Is that
possible without writing a lenghty email?

Kind regards,
Geert
Post by Danny Sokolsky
-----Original Message-----
Danny Sokolsky
Sent: dinsdag 24 maart 2009 17:52
To: General Mark Logic Developer Discussion
Subject: RE: [MarkLogic Dev General] Difference between eNode
and Data Node
Hi Saptarshi,
There is no requirement that an e-node must also have a
forest attached to it. In fact, in large implementations,
the norm is to configure e-nodes to do only e-node work and
d-nodes to do only d-node work. That is what Groups are for.
You might, for example, set up 2 groups, one for e-nodes and
one for d-nodes. The d-node groups do not need to have any
app servers on them, and the e-node groups do not need any
databases or forests. This means that each node can devote
its entire life (and all of its resources) to its role. For
example, if you have a group that only has d-nodes, you do
not need to allocate much expanded tree cache (that is used
for e-node processing). Similarly, if a group is only
e-nodes, they do not need to allocate much list cache or
compressed tree cache. Be extra careful when changing these
values, however, and make sure you know what role your hosts
are playing.
Hosts in a MarkLogic Server cluster communicate via the xdqp
protocol, which is an internal communication mechanism. Any
changes to the cluster are communicated to the other hosts
via xdqp, and forest data is transferred to the e-node via
xdqp. All of this communication happens automatically.
Hope that helps,
-Danny
Saptarshi Newyork
Sent: Monday, March 23, 2009 10:08 PM
To: General Mark Logic Developer Discussion
Subject: RE: [MarkLogic Dev General] Difference between eNode
and Data Node
Hi All,
Thanks for a great description and examples. I still have
1) I understand that same node can work as both eNode and
dNode, bu if I want to have separate eNode and dNode, in that
case, is there any difference in configuration of the host
for these two nodes?
2) In an architecture where both eNode and dNode exist,
suppose a request comes to eNode which requires an access to
forest. Then it's written that eNode will send the request to
dNode to access the forest. But every evaluator node(eNode)
is also attached to some forests. How this transfer of
request is achieved? How eNode can make a call to dNode? Is
there any configuration or coding required to achieve this?
Can under any scenario eNode access its own forest?
Thanks in advance.
regards,
Saptarshi
Subject: RE: [MarkLogic Dev General] Difference between
eNode and Data Node
To: "General Mark Logic Developer Discussion"
Date: Monday, March 23, 2009, 4:18 PM
Hi Geert,
Thanks for the great description. I will just add one
thing to what you
Whether a host acts as an e-node or a d-node depends on
what it is doing
at the time, and a given host in a MarkLogic cluster
can behave as an
e-node, a d-node, or both. For example, if you have a
single host
instance of MarkLogic Server, that host acts as both
the e-node (to
evaluate XQuery) and as the d-node (to perform forest
operations on
content).
-Danny
-----Original Message-----
developer.marklogic.com>
developer.marklogic.com> ] On Behalf Of Geert
Josten
Sent: Monday, March 23, 2009 12:59 PM
To: General Mark Logic Developer Discussion
Subject: RE: [MarkLogic Dev General] Difference between
eNode and Data
Node
Saptarshi,
I am not an authority on this matter either, but I will
try to explain
as well as possible..
1) MarkLogic Server is designed to operate with
evaluator nodes and
database nodes. The database nodes access content
stored in forests and
perform search queries over the forests. The evaluator nodes are
responsible for executing the Xquery code, webdav
requests, XDBC calls
etc. If the involved code to be executed doesn't access
any content
stored in the database (no cts:search calls, no doc
statements, etc),
but purely relies on in memory constructed content,
then database nodes
are not accessed. It has nothing to do with caching of
any kind, it is
just that content can be constructed on the fly, by
just incorporating
it in the Xquery script for instance. The example Eric
supplied is
valid.
2) MarkLogic Server does not handle failover when
filesystems crash. The
documentation
(http://developer.marklogic.com/pubs/4.0/books/cluster.pdf) explains
that filesystem crashes should be handled by using a clustered
filesystem. There are some suggestions in that
document, but I can
imagine that a RAID configuration might suffice for
simples situations
as well. Forest-level failover works as follows: you
assign multiple
hosts to one physically shared forest. These hosts are
listed in order.
If the 1st host drops out, the 2nd host takes that forest over.
Replication of data is not necessary that way, making
it more efficient
and much more scalable. At the front-end you have also
the HTTP servers
etc on the hosts. You can have as many as you like. By putting a
hardware or software load-balancer in front you can
distribute calls
coming in at a single port to all available 'evaluator' nodes.
Load-balancing is not handled by MarkLogic Server
itself, there are
plenty solutions readily available so why bother. ;-)
I am not sure whether an HTTP server is the actual
evaluator node, but I
don't think so. There is this Task Server configuration
page within the
MarkLogic Server Group Administration. This configures
Task threads on
all hosts within a single group. I have the impression
these act as
evaluator nodes and the Databases in the MarkLogic
Server Administration
correspond to the database nodes. Forest-level failover
is configured at
the Forest configuration pages.
I hope this makes things clearer to you!
Kind regards,
Geert
Drs. G.P.H. Josten
Consultant
http://www.daidalos.nl/
Daidalos BV
Source of Innovation
Hoekeindsehof 1-4
2665 JZ Bleiswijk
Tel.: +31 (0) 10 850 1200
Fax: +31 (0) 10 850 1199
http://www.daidalos.nl/
KvK 27164984
De informatie - verzonden in of met dit emailbericht -
is afkomstig van
Daidalos BV en is uitsluitend bestemd voor de
geadresseerde. Indien u
dit bericht onbedoeld hebt ontvangen, verzoeken wij u het te
verwijderen. Aan dit bericht kunnen geen rechten worden
ontleend.
developer.marklogic.com>
developer.marklogic.com> ] On Behalf Of
Saptarshi Newyork
Sent: maandag 23 maart 2009 12:30
r.marklogic.com>
Subject: [MarkLogic Dev General] Difference between eNode and
Data Node
Hi ,
1) What is the difference between eNode and dNode? I have
read that E-nodes are required to evaluate XQuery programs,
XCC/XDBC requests, WebDAV requests, and other server
requests.and dNodes are those which directly talks with the
database/forest. It is also told that if the request does not
need any forest data to complete, then an e-node request is
evaluated entirely on the e-node. I do not understand how
this is possible!! If eNode is meant for XQuery evaluation
and XQuery needs an XML to process, then every eNode request
should talk to dNode. Is there any caching mechanism? It will
be great if anybody can explain this to me?
2) There are two failover mechanism explained in the
documentation. Forest level failover and eNode level
failover. It seems that forest data level failover is not
handled by Marklogic. Like if the filesystem crashes, is
there anyway by which Marklogic server replicates the forest
to other hosts in same or different cluster? If this feature
is not presently supported, then when can we expect this on
the roadmap?
Thanks in advance.
regards,
Saptarshi
_______________________________________________
General mailing list
r.marklogic.com>
http://xqzone.com/mailman/listinfo/general
_______________________________________________
General mailing list
r.marklogic.com>
http://xqzone.com/mailman/listinfo/general
_______________________________________________
General mailing list
***@developer.marklogic.com
http://xqzone.com/mailman/listinfo/general
Geert Josten
2009-03-24 21:09:25 UTC
Permalink
Hi Danny,

Yes, it certainly makes things more clear! At least to me..

One additional question that is more or less in line with this topic: how should one handle large amounts of data? Just putting all in one forest doesn't make sense. Handling content in different databases isn't always an option either and perhaps not very efficient when it is necessary to be able to search over all content.

Is it sufficient to just create multiple forests and assign those to one database? How is content divided over these forests? Or am I barking up the wrong tree? I didn't find much documentation on this particular field.

Kind regards,
Geert
Post by Danny Sokolsky
-----Original Message-----
Danny Sokolsky
Sent: dinsdag 24 maart 2009 21:44
To: General Mark Logic Developer Discussion
Subject: RE: [MarkLogic Dev General] Difference between eNode
and Data Node
Yes, any e-node host in the cluster can access any database in the
cluster. Think about the d-nodes as hosting forests. The database is
an abstraction on top of some number of forests. The e-nodes host
interfaces into the databases, which are App Servers (HTTP, XDBC, and
WebDAV).
host1: e-node group, has an HTTP server on port 80 talking to database
d1. It has no forest assignments.
host2: d-node group, has forest f1, which is attached to database d1.
host3: d-node group, has forest Security, which is the forest for the
security db for the cluster.
In order to access content in database d1, you must come in
through the
App Server on host1. A request such as doc("/foo.xml") is processed
* an http request to run the code doc("/foo.xml") is submitted to the
HTTP server on host1 (for example, there might be an XQuery file names
doc.xqy under the app server root with the code that is accessed by an
http request: http://host1/doc.xqy).
* host1 (the evaluator node) gets the HTTP request, parses the XQuery.
* this request requires forest data, so host1 asks the
forests for this
database (only one in this example) for the data needed (the document
/foo.xml in this example). This communication happens over xdqp.
* host2 responds by sending the forest data for /foo.xml back to host1
(via xdqp).
* host1 then processes the result and returns it back to the client.
As I said, this is a bit simplified. For example, the host that is
hosting the Security forest also gets involved, but its
involvement is a
little more complex and is not really important for understanding how
the e-node / d-node communication takes place. Suffice it to say that
every request is authenticated and a user can only see and do
that which
he is authorized.
There are other simplifications too. But hopefully that gives you a
better idea of how it works.
-Danny
-----Original Message-----
Josten
Sent: Tuesday, March 24, 2009 1:06 PM
To: General Mark Logic Developer Discussion
Subject: RE: [MarkLogic Dev General] Difference between eNode and Data
Node
Hi Danny,
How are Databases in the d-group accessed from the e-nodes in the
e-group? Is it sufficient that both groups are defined in the same
cluster? Databases are configured outside the Group
configuration. Is a
d-group created by only assinging hosts connected to a forest, and
assigning all other hosts to the e-group which contains the needed app
servers?
It would be helpful if you could give a more explict example. Is that
possible without writing a lenghty email?
Kind regards,
Geert
Post by Danny Sokolsky
-----Original Message-----
Danny Sokolsky
Sent: dinsdag 24 maart 2009 17:52
To: General Mark Logic Developer Discussion
Subject: RE: [MarkLogic Dev General] Difference between eNode
and Data Node
Hi Saptarshi,
There is no requirement that an e-node must also have a
forest attached to it. In fact, in large implementations,
the norm is to configure e-nodes to do only e-node work and
d-nodes to do only d-node work. That is what Groups are for.
You might, for example, set up 2 groups, one for e-nodes and
one for d-nodes. The d-node groups do not need to have any
app servers on them, and the e-node groups do not need any
databases or forests. This means that each node can devote
its entire life (and all of its resources) to its role. For
example, if you have a group that only has d-nodes, you do
not need to allocate much expanded tree cache (that is used
for e-node processing). Similarly, if a group is only
e-nodes, they do not need to allocate much list cache or
compressed tree cache. Be extra careful when changing these
values, however, and make sure you know what role your hosts
are playing.
Hosts in a MarkLogic Server cluster communicate via the xdqp
protocol, which is an internal communication mechanism. Any
changes to the cluster are communicated to the other hosts
via xdqp, and forest data is transferred to the e-node via
xdqp. All of this communication happens automatically.
Hope that helps,
-Danny
Saptarshi Newyork
Sent: Monday, March 23, 2009 10:08 PM
To: General Mark Logic Developer Discussion
Subject: RE: [MarkLogic Dev General] Difference between eNode
and Data Node
Hi All,
Thanks for a great description and examples. I still have
1) I understand that same node can work as both eNode and
dNode, bu if I want to have separate eNode and dNode, in that
case, is there any difference in configuration of the host
for these two nodes?
2) In an architecture where both eNode and dNode exist,
suppose a request comes to eNode which requires an access to
forest. Then it's written that eNode will send the request to
dNode to access the forest. But every evaluator node(eNode)
is also attached to some forests. How this transfer of
request is achieved? How eNode can make a call to dNode? Is
there any configuration or coding required to achieve this?
Can under any scenario eNode access its own forest?
Thanks in advance.
regards,
Saptarshi
Subject: RE: [MarkLogic Dev General] Difference between
eNode and Data Node
To: "General Mark Logic Developer Discussion"
Date: Monday, March 23, 2009, 4:18 PM
Hi Geert,
Thanks for the great description. I will just add one
thing to what you
Whether a host acts as an e-node or a d-node depends on
what it is doing
at the time, and a given host in a MarkLogic cluster
can behave as an
e-node, a d-node, or both. For example, if you have a
single host
instance of MarkLogic Server, that host acts as both
the e-node (to
evaluate XQuery) and as the d-node (to perform forest
operations on
content).
-Danny
-----Original Message-----
developer.marklogic.com>
developer.marklogic.com> ] On Behalf Of Geert
Josten
Sent: Monday, March 23, 2009 12:59 PM
To: General Mark Logic Developer Discussion
Subject: RE: [MarkLogic Dev General] Difference between
eNode and Data
Node
Saptarshi,
I am not an authority on this matter either, but I will
try to explain
as well as possible..
1) MarkLogic Server is designed to operate with
evaluator nodes and
database nodes. The database nodes access content
stored in forests and
perform search queries over the forests. The evaluator nodes are
responsible for executing the Xquery code, webdav
requests, XDBC calls
etc. If the involved code to be executed doesn't access
any content
stored in the database (no cts:search calls, no doc
statements, etc),
but purely relies on in memory constructed content,
then database nodes
are not accessed. It has nothing to do with caching of
any kind, it is
just that content can be constructed on the fly, by
just incorporating
it in the Xquery script for instance. The example Eric
supplied is
valid.
2) MarkLogic Server does not handle failover when
filesystems crash. The
documentation
(http://developer.marklogic.com/pubs/4.0/books/cluster.pdf) explains
that filesystem crashes should be handled by using a clustered
filesystem. There are some suggestions in that
document, but I can
imagine that a RAID configuration might suffice for
simples situations
as well. Forest-level failover works as follows: you
assign multiple
hosts to one physically shared forest. These hosts are
listed in order.
If the 1st host drops out, the 2nd host takes that forest over.
Replication of data is not necessary that way, making
it more efficient
and much more scalable. At the front-end you have also
the HTTP servers
etc on the hosts. You can have as many as you like. By putting a
hardware or software load-balancer in front you can
distribute calls
coming in at a single port to all available 'evaluator' nodes.
Load-balancing is not handled by MarkLogic Server
itself, there are
plenty solutions readily available so why bother. ;-)
I am not sure whether an HTTP server is the actual
evaluator node, but I
don't think so. There is this Task Server configuration
page within the
MarkLogic Server Group Administration. This configures
Task threads on
all hosts within a single group. I have the impression
these act as
evaluator nodes and the Databases in the MarkLogic
Server Administration
correspond to the database nodes. Forest-level failover
is configured at
the Forest configuration pages.
I hope this makes things clearer to you!
Kind regards,
Geert
Drs. G.P.H. Josten
Consultant
http://www.daidalos.nl/
Daidalos BV
Source of Innovation
Hoekeindsehof 1-4
2665 JZ Bleiswijk
Tel.: +31 (0) 10 850 1200
Fax: +31 (0) 10 850 1199
http://www.daidalos.nl/
KvK 27164984
De informatie - verzonden in of met dit emailbericht -
is afkomstig van
Daidalos BV en is uitsluitend bestemd voor de
geadresseerde. Indien u
dit bericht onbedoeld hebt ontvangen, verzoeken wij u het te
verwijderen. Aan dit bericht kunnen geen rechten worden
ontleend.
developer.marklogic.com>
developer.marklogic.com> ] On Behalf Of
Saptarshi Newyork
Sent: maandag 23 maart 2009 12:30
r.marklogic.com>
Subject: [MarkLogic Dev General] Difference between eNode and
Data Node
Hi ,
1) What is the difference between eNode and dNode? I have
read that E-nodes are required to evaluate XQuery programs,
XCC/XDBC requests, WebDAV requests, and other server
requests.and dNodes are those which directly talks with the
database/forest. It is also told that if the request does not
need any forest data to complete, then an e-node request is
evaluated entirely on the e-node. I do not understand how
this is possible!! If eNode is meant for XQuery evaluation
and XQuery needs an XML to process, then every eNode request
should talk to dNode. Is there any caching mechanism? It will
be great if anybody can explain this to me?
2) There are two failover mechanism explained in the
documentation. Forest level failover and eNode level
failover. It seems that forest data level failover is not
handled by Marklogic. Like if the filesystem crashes, is
there anyway by which Marklogic server replicates the forest
to other hosts in same or different cluster? If this feature
is not presently supported, then when can we expect this on
the roadmap?
Thanks in advance.
regards,
Saptarshi
_______________________________________________
General mailing list
r.marklogic.com>
http://xqzone.com/mailman/listinfo/general
_______________________________________________
General mailing list
r.marklogic.com>
http://xqzone.com/mailman/listinfo/general
_______________________________________________
General mailing list
http://xqzone.com/mailman/listinfo/general
_______________________________________________
General mailing list
http://xqzone.com/mailman/listinfo/general
Danny Sokolsky
2009-03-24 21:39:48 UTC
Permalink
Hi Geert,

You are definitely barking up the right tree...

As your content grows, you add more forests. As your number of forests
grow, you add more d-nodes to host those forests. There are some
guidelines about forest sizes in the Scalability documentation
(http://developer.marklogic.com/pubs/4.0/books/cluster.pdf) -- see page
7. The exec summary is that, as a rule-of thumb (there are always
exceptions), at 200GB or 32-million fragments (on a 64-bit system), you
should add another forest.

The system is designed to scale this way. The system mixes the content
across the forests (although it also allows you to choose which forest
to put content). This design allows it to scale to extremely large
systems.

-Danny

-----Original Message-----
From: general-***@developer.marklogic.com
[mailto:general-***@developer.marklogic.com] On Behalf Of Geert
Josten
Sent: Tuesday, March 24, 2009 2:09 PM
To: General Mark Logic Developer Discussion
Subject: RE: [MarkLogic Dev General] Difference between eNode and Data
Node

Hi Danny,

Yes, it certainly makes things more clear! At least to me..

One additional question that is more or less in line with this topic:
how should one handle large amounts of data? Just putting all in one
forest doesn't make sense. Handling content in different databases isn't
always an option either and perhaps not very efficient when it is
necessary to be able to search over all content.

Is it sufficient to just create multiple forests and assign those to one
database? How is content divided over these forests? Or am I barking up
the wrong tree? I didn't find much documentation on this particular
field.

Kind regards,
Geert
Post by Danny Sokolsky
-----Original Message-----
Danny Sokolsky
Sent: dinsdag 24 maart 2009 21:44
To: General Mark Logic Developer Discussion
Subject: RE: [MarkLogic Dev General] Difference between eNode
and Data Node
Yes, any e-node host in the cluster can access any database in the
cluster. Think about the d-nodes as hosting forests. The database is
an abstraction on top of some number of forests. The e-nodes host
interfaces into the databases, which are App Servers (HTTP, XDBC, and
WebDAV).
host1: e-node group, has an HTTP server on port 80 talking to database
d1. It has no forest assignments.
host2: d-node group, has forest f1, which is attached to database d1.
host3: d-node group, has forest Security, which is the forest for the
security db for the cluster.
In order to access content in database d1, you must come in
through the
App Server on host1. A request such as doc("/foo.xml") is processed
* an http request to run the code doc("/foo.xml") is submitted to the
HTTP server on host1 (for example, there might be an XQuery file names
doc.xqy under the app server root with the code that is accessed by an
http request: http://host1/doc.xqy).
* host1 (the evaluator node) gets the HTTP request, parses the XQuery.
* this request requires forest data, so host1 asks the
forests for this
database (only one in this example) for the data needed (the document
/foo.xml in this example). This communication happens over xdqp.
* host2 responds by sending the forest data for /foo.xml back to host1
(via xdqp).
* host1 then processes the result and returns it back to the client.
As I said, this is a bit simplified. For example, the host that is
hosting the Security forest also gets involved, but its
involvement is a
little more complex and is not really important for understanding how
the e-node / d-node communication takes place. Suffice it to say that
every request is authenticated and a user can only see and do
that which
he is authorized.
There are other simplifications too. But hopefully that gives you a
better idea of how it works.
-Danny
-----Original Message-----
Josten
Sent: Tuesday, March 24, 2009 1:06 PM
To: General Mark Logic Developer Discussion
Subject: RE: [MarkLogic Dev General] Difference between eNode and Data
Node
Hi Danny,
How are Databases in the d-group accessed from the e-nodes in the
e-group? Is it sufficient that both groups are defined in the same
cluster? Databases are configured outside the Group
configuration. Is a
d-group created by only assinging hosts connected to a forest, and
assigning all other hosts to the e-group which contains the needed app
servers?
It would be helpful if you could give a more explict example. Is that
possible without writing a lenghty email?
Kind regards,
Geert
Post by Danny Sokolsky
-----Original Message-----
Danny Sokolsky
Sent: dinsdag 24 maart 2009 17:52
To: General Mark Logic Developer Discussion
Subject: RE: [MarkLogic Dev General] Difference between eNode
and Data Node
Hi Saptarshi,
There is no requirement that an e-node must also have a
forest attached to it. In fact, in large implementations,
the norm is to configure e-nodes to do only e-node work and
d-nodes to do only d-node work. That is what Groups are for.
You might, for example, set up 2 groups, one for e-nodes and
one for d-nodes. The d-node groups do not need to have any
app servers on them, and the e-node groups do not need any
databases or forests. This means that each node can devote
its entire life (and all of its resources) to its role. For
example, if you have a group that only has d-nodes, you do
not need to allocate much expanded tree cache (that is used
for e-node processing). Similarly, if a group is only
e-nodes, they do not need to allocate much list cache or
compressed tree cache. Be extra careful when changing these
values, however, and make sure you know what role your hosts
are playing.
Hosts in a MarkLogic Server cluster communicate via the xdqp
protocol, which is an internal communication mechanism. Any
changes to the cluster are communicated to the other hosts
via xdqp, and forest data is transferred to the e-node via
xdqp. All of this communication happens automatically.
Hope that helps,
-Danny
Saptarshi Newyork
Sent: Monday, March 23, 2009 10:08 PM
To: General Mark Logic Developer Discussion
Subject: RE: [MarkLogic Dev General] Difference between eNode
and Data Node
Hi All,
Thanks for a great description and examples. I still have
1) I understand that same node can work as both eNode and
dNode, bu if I want to have separate eNode and dNode, in that
case, is there any difference in configuration of the host
for these two nodes?
2) In an architecture where both eNode and dNode exist,
suppose a request comes to eNode which requires an access to
forest. Then it's written that eNode will send the request to
dNode to access the forest. But every evaluator node(eNode)
is also attached to some forests. How this transfer of
request is achieved? How eNode can make a call to dNode? Is
there any configuration or coding required to achieve this?
Can under any scenario eNode access its own forest?
Thanks in advance.
regards,
Saptarshi
Subject: RE: [MarkLogic Dev General] Difference between
eNode and Data Node
To: "General Mark Logic Developer Discussion"
Date: Monday, March 23, 2009, 4:18 PM
Hi Geert,
Thanks for the great description. I will just add one
thing to what you
Whether a host acts as an e-node or a d-node depends on
what it is doing
at the time, and a given host in a MarkLogic cluster
can behave as an
e-node, a d-node, or both. For example, if you have a
single host
instance of MarkLogic Server, that host acts as both
the e-node (to
evaluate XQuery) and as the d-node (to perform forest
operations on
content).
-Danny
-----Original Message-----
developer.marklogic.com>
developer.marklogic.com> ] On Behalf Of Geert
Josten
Sent: Monday, March 23, 2009 12:59 PM
To: General Mark Logic Developer Discussion
Subject: RE: [MarkLogic Dev General] Difference between
eNode and Data
Node
Saptarshi,
I am not an authority on this matter either, but I will
try to explain
as well as possible..
1) MarkLogic Server is designed to operate with
evaluator nodes and
database nodes. The database nodes access content
stored in forests and
perform search queries over the forests. The evaluator nodes are
responsible for executing the Xquery code, webdav
requests, XDBC calls
etc. If the involved code to be executed doesn't access
any content
stored in the database (no cts:search calls, no doc
statements, etc),
but purely relies on in memory constructed content,
then database nodes
are not accessed. It has nothing to do with caching of
any kind, it is
just that content can be constructed on the fly, by
just incorporating
it in the Xquery script for instance. The example Eric
supplied is
valid.
2) MarkLogic Server does not handle failover when
filesystems crash. The
documentation
(http://developer.marklogic.com/pubs/4.0/books/cluster.pdf) explains
that filesystem crashes should be handled by using a clustered
filesystem. There are some suggestions in that
document, but I can
imagine that a RAID configuration might suffice for
simples situations
as well. Forest-level failover works as follows: you
assign multiple
hosts to one physically shared forest. These hosts are
listed in order.
If the 1st host drops out, the 2nd host takes that forest over.
Replication of data is not necessary that way, making
it more efficient
and much more scalable. At the front-end you have also
the HTTP servers
etc on the hosts. You can have as many as you like. By putting a
hardware or software load-balancer in front you can
distribute calls
coming in at a single port to all available 'evaluator' nodes.
Load-balancing is not handled by MarkLogic Server
itself, there are
plenty solutions readily available so why bother. ;-)
I am not sure whether an HTTP server is the actual
evaluator node, but I
don't think so. There is this Task Server configuration
page within the
MarkLogic Server Group Administration. This configures
Task threads on
all hosts within a single group. I have the impression
these act as
evaluator nodes and the Databases in the MarkLogic
Server Administration
correspond to the database nodes. Forest-level failover
is configured at
the Forest configuration pages.
I hope this makes things clearer to you!
Kind regards,
Geert
Drs. G.P.H. Josten
Consultant
http://www.daidalos.nl/
Daidalos BV
Source of Innovation
Hoekeindsehof 1-4
2665 JZ Bleiswijk
Tel.: +31 (0) 10 850 1200
Fax: +31 (0) 10 850 1199
http://www.daidalos.nl/
KvK 27164984
De informatie - verzonden in of met dit emailbericht -
is afkomstig van
Daidalos BV en is uitsluitend bestemd voor de
geadresseerde. Indien u
dit bericht onbedoeld hebt ontvangen, verzoeken wij u het te
verwijderen. Aan dit bericht kunnen geen rechten worden
ontleend.
developer.marklogic.com>
developer.marklogic.com> ] On Behalf Of
Saptarshi Newyork
Sent: maandag 23 maart 2009 12:30
r.marklogic.com>
Subject: [MarkLogic Dev General] Difference between eNode and
Data Node
Hi ,
1) What is the difference between eNode and dNode? I have
read that E-nodes are required to evaluate XQuery programs,
XCC/XDBC requests, WebDAV requests, and other server
requests.and dNodes are those which directly talks with the
database/forest. It is also told that if the request does not
need any forest data to complete, then an e-node request is
evaluated entirely on the e-node. I do not understand how
this is possible!! If eNode is meant for XQuery evaluation
and XQuery needs an XML to process, then every eNode request
should talk to dNode. Is there any caching mechanism? It will
be great if anybody can explain this to me?
2) There are two failover mechanism explained in the
documentation. Forest level failover and eNode level
failover. It seems that forest data level failover is not
handled by Marklogic. Like if the filesystem crashes, is
there anyway by which Marklogic server replicates the forest
to other hosts in same or different cluster? If this feature
is not presently supported, then when can we expect this on
the roadmap?
Thanks in advance.
regards,
Saptarshi
_______________________________________________
General mailing list
r.marklogic.com>
http://xqzone.com/mailman/listinfo/general
_______________________________________________
General mailing list
r.marklogic.com>
http://xqzone.com/mailman/listinfo/general
_______________________________________________
General mailing list
http://xqzone.com/mailman/listinfo/general
_______________________________________________
General mailing list
http://xqzone.com/mailman/listinfo/general
_______________________________________________
General mailing list
***@developer.marklogic.com
http://xqzone.com/mailman/listinfo/general

Loading...