2014년 12월 26일 금요일

Schema design: multi-language suppor

There was another thread with the same subject, but my issue is a bit different. I am also looking into multi-language support of documents in MongoDB. I set out with the following schema in mind:

{
  title
: {
    en
: "Title in English",
    ja
: "日本語のタイトル",
    fr
: "Subject en Francais"
  
},
  content
: {
    en
: "Content in English",
    ja
: "日本語のコンテンツ",
    fr
: "Contenu en Francais"
  
}
}

I have a nice class that does that pretty much automatically, but I am running into indexing issues. I want to be able to search through texts using text indexes, and did:

db.articles.ensureIndex({ title: "text", content: "text" })

Then, when I try to find some text in my database, I don't get any results. I am thinking it is because text indexes don't look through subdocuments. So I tried the "$**" to match all fields, but that won't work in subdocuments, only in the top-level document, but then it will index ALL string fields. The documents I have here are rather large.

Previously, I had:

{
  lang
: "en",
  title
: "Title",
  content
: "Content"
}

and the aforementioned ensureIndex worked fine. But then I need to be able to tell the system which documents are translations of which documents.

Isn't there a way to use the <lang>: <content> schema approach for translated content, while also be able to search for text through such documents?

Also open to discuss implementations of multilingual content. I wrote several articles on it on my blog.



I'm no expert, but I would think you'd be better off, if you keep the translations in their own documents. If you need to reference a group of translations back to a "master" document, then add ids to the documents and simply reference them.  

{
  _id
: "1",
  lang
: "en",
  title
: "Title",

  content
: "Content",
}
{
  _id: "2",
  lang
: "fr",
  title
: "Subject",
  content
: "Contenu",
  master
: "1"
}
{
  _id: "3",
  lang
: "de",
  title
: "Titel",
  content
: "Inhalt",
  master
: "1"
}


댓글 없음:

댓글 쓰기