This message is mainly a reply to an excellent point raised by Kim
van der Linde ("Some initial thoughts"), but it also thematically
follows up my "a truly rigorous/accurate resource process design"
Post by Kim van der Linde
3. Progressive fork. In my opinion, probably the worst idea possible.
Wikipedia has 1.2 million pages. Suppose we can update 10.000 of them to
our standards in the first six months. That effectively means that when
people come to citizendium, they have a change of one in 120 to hit an
improved page, for the rest, it is just old unreliable Wikipedia that
they will see. Aka, we will be perceived equally bad as them. For that
reason, I suggest we modify the 'article not found' page by including an
explicit link to Wikipedia, with an appropriate message. In that way, we
make explicit what is us and what is them. It is also an incentive for
citizendium editors to get as soon as possible as many as possible pages
updated and to par with our quality standards.
I couldn't agree more with this. And if the principle behind it were
implemented, that would solve quite a lot of issues that have been
raised by various people.
Citizendium should be its own distinct entity where *all* of the
content it stores should be vetted, or at least each content item
should be manually given the once-over by whomever submits it. No
detail or section should be added that wasn't added intentionally on
its own merits. If a visitor sees any content on our site, they
would know that we stand behind it, notwitstanding that anything
could still possibly have errors.
A mass copy of Wikipedia is the wrong approach to take for several
reasons, some of which reiterate Kim's comments:
1. It needs to be in black and white as to which content we stand by
vs which content we have no opinion on due to being copied. Nothing
spells that out more than a big sign reading "article doesn't exist
in Citizendium yet, meanwhile here is a helpful link to an
appropriate Wikipedia article". (And that link has a target=blank so
Citizendium remains open in their browser when they go to Wikipedia
for that article, and they don't forget about us once there.)
Simply coloring text differently depending on ours vs theirs, or
having 2 versions of a page (each of which some of us have
suggested), does not stand out at all, and 99% of the public won't
notice it, so they will just think that most of our content is bad,
or is "the same" as Wikipedia; and since the latter has (for now)
better servers, the public will go there instead as we haven't really
2. A database which is initially small (because it only has our own
vetted content) requires less resources to maintain and serve. We
can start out effectively with a smaller initial investment in
servers. It gives us more time to scale up and work through kinks in
our system, and it better survives any "slashdot effect". We only
really get hits on our server for articles we have vetted, and for
the 99% plus that we haven't, we don't get most of the hits.
3. The good content stands out more, and so any experts will feel
more justified or rewarded for contributing, since their goodness
won't be lost in a sea of mediocricy. Similarly for non-expert
4. Bringing in content piecemeal also allows us to rethink how it is
organized, so we can have a second chance to pick good article names
or otherwise organize tags or wider topics for articles differently,
where we want to.
In particular, we could address from the start about dealing with
homonyms, and go right from the start such that *all* normal articles
have urls like "Foo (Bar)", and such that *all* urls with just "Foo"
are disambiguation pages, even if there is only one meaning; there
could be others later, so we should think ahead and design to scale,
and that is one way to do it.
If we do deal with article urls that way from the start, then all
links can initially be "Foo (Bar)" and hence no one clicking on an
older url is likely to come to a disambiguation page due to a "Foo"
being split up.
5. We can and should be treating Wikipedia as a SOURCE, just like
anything else, and not privilege them above other sources by us being
Though that's not to say that Wikipedia can't be a source that we
copy text from more or less verbatim, since its and our license
allows us to do so. But it would still be manual, though our techs
could have utilities to make common copying operations easier like a
"copy from the Wikipedia article at this (possibly different) article
url" feature that inputs directly into an editing form.
Wikipedia should explicitly be an EXTERNAL source, like everything
else, and be cited in the sources section like anything else. This
also helps our visitors get a more accurate impression of how we know
what we know ... because Wikipedia (external) and Foo (external) and
Bar (external) say so.
It is a lot easier to treat Wikipedia as a source if we aren't a
"fork" of them.
Incidentally, for the inevitable copying back, Wikipedia should also
consider Citizendium to be a source of theirs, in a similar fashion
to what I prescribe.
6. This approach lets us be multi-national right from the start,
rather than just starting off as a clone of the English Wikipedia
data. In the spirit of free and open-source software (FOSS), each of
us will "scratch our own itch", and want to write or edit articles in
any language we feel like, so the technical capacity should be there
from the start.
While there may have been an argument for English-only in Larry's
proposal as a matter of simplicity, I don't see that being necessary
when we're not copying Wikipedia whole-hog from the start. So we can
be multi-national *now*.
On a related note to #4, I suggest having the multiple languages
integrated in a single database and interface, rather than having a
separate "site" for each language.
That is, users can choose a language from a menu when they enter, but
the only distinction between what they get for doing so would be what
language the Citizendium menus and buttons and standard components
are in, as well as a vote for preferred language of content.
Moreover, we only need one internet domain, and a language choice can
be part of the path; eg: http://citizendium.org/en/whatever .
Similarly, all articles in the database should be inherantly
multi-lingual, in a fashion reminiscent to multi-lingual software,
taking for one example how Mac OS X works. For each article (the
"Foo (Bar)" mentioned in #4), exists one or more "language
resources", each of which is a version of the article written in one
language. A visitor by default sees the version in their chosen
language, but if one doesn't exist, then can be given the option to
either go to the version in a different language, or add a
translation (sort of like adding a new article), or alternately see
the external Wikipedia version in that language.
The idea is that there should be a clear symmetry between all
language versions of an article, where they contain the same info,
but in different text, and also, that they have identical urls save
for a language component. In addition, article elements like images
that are language agnostic can be applied to the collection as a
whole, and appear in each version. A key point is that it should be
easy to tell for any given article which translations exist.
Note that, in real life, these will likely be out of sync, and
regardless there should be a pseudo-source-citing record that says
whether a particular language resource is the original expert-written
article, or whether it is a translation of a different resource, and
which one. Then people looking at any particular language can follow
the source chain from our translation to our original and so on.
Translators can make mistakes. And in any event, this citation would
include when the translation was done, so we know how accurate a
version is relative to its other-language original. Of course, it
can go both ways, with some edits being originally done in one
language B then translated to A, while other parts go A to B. It
needs to be appropriately marked.
Now, assuming this is different from how Wikipedia operates, this is
another advantage of starting from the bottom as I suggest, so the
pieces can be appropriately put in place.
It also perhaps stands to reason that, while any version can be an
original, it would be recommended to do all primary edits in the
English version when the editor understands enough English to
reliably do so, so to make some things simpler; English is the
nearest to a universal language that we have.
In regards to the point one may raise about how the same article
names may be different in different languages (some are, some
aren't), we can still set up things technically so that the right
7. I had other reasons, but they slip my mind at the moment ... more
Post by Kim van der Linde
1. anonymous editing. This is only a problem at Wikipedia because it
facilitates vandalism and insertion of nonsense, but some anonymous
editors are adding real good content. The problem comes with the fact
that what they add is immediately visible for the world, and as such,
becomes a magnate for vandalism. If we get a level of editorial control
(approve flagging), the vandalism will never reach the published pages.
As such, we are trying to do a double kill on vandalism etc.
2. Real world names. Nice, but unnecessary. For the editors I can see
this requirement, but along the same line as for anonymous editing,
double kill of a problem, and potentially chasing people away.
Following the principle of saying just what we actually know, I
propose that real world names not be a strict requirement, but a soft
requirement and/or a strong recommendation.
One reason is that people can always lie, and pretend to put in a
real name when they actually aren't. Second is that there is
sometimes a legitimate reason for someone to hide whom they are, such
as if what they say will get them arrested or killed by an oppresive
All that said, this is what I propose:
1. Everyone who adds or edits anything on Citizendium must have a
named account on Citizendium. No one can add or edit anything
without logging in.
2. When someone creates a Citizendium account, they supply a Public
Name, which is what someone is known by on the system, regardless of
whether it is their real name or not (people will use their real name
usually by recommendation), and an email address, which has to be
verified to activate the account (admins may need to contact them) by
sending an auto-message to it with some code, so that we know it is
their address and they spelled it correctly. Then they provide a
password, and separately there are various profile-type fields where
they can put in their real name and CV and whatever.
3. Like a typical article, an account has its own corresponding page
with all the profile-type details on it, and it can include its own
cited sources that refer to the person's web site or otherwise
external proof that they are who they say they are.
4. All Citizendium add/edit/etc activity is attached to the account,
for crediting and by way of that to their CV or whatever.
5. Of course, someone could still lie about who they are, but what
info they provide or how verifyable it is will contribute to the
validity of any changes they make to the system, and what changes
they make are tracked to the account. Generally speaking, the public
or other systems can put more weight on one account than another, for
determining experts et al, based on this proof attached to the
So, that's about that for today.
I remind you, as mentioned in my other posts, that I have thought
about these matters for a *long* time (since 1998 at least) as I work
towards my own system that implements such things, possibly to
supplant wikis some day.
Consideration and feedback appreciated.
Thank you. -- Darren Duncan