Types of Personalization in Portals – User Personalization

In the white paper we posted on 3/9 (Integrating ECM with Portal Technologies) I wrote a section that gave an overview of the 3 main types of personalization that are normally implemented in a portal environment.

  1. User Personalization
  2. Content Filtering Personalization
  3. Trend Analysis Personalization

In a short series of posts over the next few weeks I will go into a bit more depth on each type that I mentioned in the paper including technical details when applicable.

First up is User Personalization. Continue reading

File Naming, GUIDs, Duplication, Identity and Metadata: A Response to John O'Gorman

I love social network conversations. There has been a great one going on over in LinkedIn. In the AIIM Group for Intelligent Information Management a conversation was started around whether or not file naming conventions were needed when we have robust EDMSs (enterprise document management systems). John O’Gorman, an Information Integration specialist made a provocative post (in the spirit of great dialog) and I responded. The answers and debate have grown and now, rather than take up the whole form, I am posting my reply here, so that you may participate as well!

If you have not read the thread, you can do so HERE (if you want to skip to the billy vs john debate go to page 3). Without further ado, here is my reply:

I appreciate the engagement and invite others into the fray. I think this makes us all sharper! So in the spirit of mutual enlightenment and the disputational interrogative we engage!

We start off on common ground agreeing that humans are *much* better suited at pattern recognition and discrimination than are programs. While they can process vastly larger quantities of information, we can identify and consume “relevant” information more efficiently (at least now).

1) You mention that a computer cannot have even 2 files with the same name in a folder while we can pick out one from many quite easily. I agree with the example you give but the question was not answered and your answer imports some assumptions that aren’t necessarily so. Let me explain. I would argue (based on my pop-sci understanding) that human discrimination is facilitated by the way our brains “tag” memories with unique identifiers. We use electro-chemical “naming conventions” programs use other conventions. Same fundamental strategy though. To this extent ,the argument that the strategy that computer programs use is bad because it is different than what human brains use fails. Showing a difference in result does not impugn the process, merely the efficiency or execution or a host of other limiting factors. Secondly your example imports some assumptions that do not hold. Why do you assume the windows file system uniqueness requirements? I work with EDMSs that can store N number of files with the same file name in the same “location” and display those in a single collection and slap a “folder” icon on top of it. Computer programs? Yep. Windows file system limits? nope.

2) Maybe I didn’t understand when you originally stated, “the reason we put meaningful labels on anything is because no one has come up with a better alternative”. This sounded to me like you considered this the worst of no possible alternatives. I am not sure why you feel this way. As you say, it seems to work pretty well for our brains and while search isn’t up to our levels yet, it is getting there. I would also argue that basic keyword indexing (tokenization) has limitations. But aren’t you creating a straw man argument here? Why must disambiguation happen at this level? Search engines also incorporate (and are increasing their incorporation of) other weighting/prioritization/relevancy axes in order to achieve disambiguation and increase relevancy. This is why entity extraction and ontology assisted querying is so promising. Before going there though, disambiguation and prioritization happens through incorporating inbound linking, folksonomies, usage/consumption patterns and other factors. Bibliographic tracing is quite popular in higher education systems in order to figure out which concepts are derivative and which are foundational. Computer programs are doing the tracing and therefore the discrimination here.

3) I grant that, as you say, two different GUIDs only assert that the two resources to which they are associated are deemed different. But again you brought up the distinction between systems and humans. A picture of (the same) JaneSmith (age 3) and JaneSmith (age 33) are very different and *depending on your prior relationship with here*. The difference may be enough to prevent your identification of her as the same person or (if you are her father) not enough to confuse you even for a moment that it is the same person. This raises two very important questions around meaningful (aka relevant) difference and identity. With a person we presume continuity of something that transcends cellular existence (personality, soul, spirit, whatever). With information what do we have? Changing a single byte in a document between versions alters checksum values thereby creating an entirely new item (by one interpretation). But that one byte is not likely a meaningful difference and so we are comfortable with maintaining the ID of the item. So hyper content centric identity of information seems not to be useful. Alternately I can create an empty “unique id” for an item in my EDMS and then proceed to associate that ID with an image of JaneSmith, then a resume for JaneSmith then a video of the Space Shuttle. That unique ID retains its uniqueness among all other items in the system but is a container for “nothing – image – document – video”. The ID is still able to be distinguished from all others in the system. The difference is the intentionality with which it is assigned. Humans import definition by creating the ID and therefor create the uniquess regardless of what the “thing” is the identifier points to. This is the extrinsic identification rather than intrinsic identification. I think there is a place for both kinds of identity in our world but EDMSs generally focus on and utilize extrinsic identity. This is because we can import purpose more easily to extrinsically identified objects (since we control them) rather than intrinsically identified objects (which we have to find a use for).

4) I agree whole-heartedly that, as you say “There is some philosophy in every human endeavor, just as there are some mathematics in every computer”. I, like you enjoy engaging on this level as well! So props to us and the readers who enjoy it. I think this is an important area where we can elevate the discussion and practice in our community. So I’ll bite again. While I love talking about how interior angles of a triangle can add up to 270 when laid out on a sphere, I’m not seeing the information science analogy but I am excited to hear it! In the mean time, I do not follow your LA example. You say that “without being told ‘Los Angeles’ (GUID h34dh23a4b7b33c8361) is different than ‘Los Angeles’ (GUID 7b33c8361h34dh23a4b) a computer is ignorant of that difference.” But I would reply that the *fact* of the different GUIDs is the discriminating factor for a computer. So by definition a computer *must* “know” that title attribute “Los Angeles” of GUID 1 is different and distinct from title attribute “Los Angeles” of GUID 2. The trouble comes up in how those GUIDs were assigned. If extrinsically assigned (i.e. though an act of intention) then we the human consumers of that information assume/rely on the idea that the difference is meaningful. If intrinsically assigned (e.g. automatically via a crawler or something else) then we get into potential duplication / overlap / synonym problems (e.g the checksums were different but the objects were not meaningfully different). The trouble I have with this is that meaning is always imported by humans and is dependent on the scope of our problem domain. Google Earth can show the globe or my back yard. At one scope of the problem domain (e.g. where do you live?) both the “glob” and “my back yard” answers are correct. To a space alien the globe provides the appropriate level of meaning. To my mother the back yard provides the appropriate level of meaning. So by setting up the question the way you have you seem to be assuming a common scope which is only ever extrinsically identified and therefore unable to be held in common.

5) I think the “assigned to maintain” vs “derived to describe” strategy is very good and the best part is that they are not mutually exclusive. I agree with you that these are, “an interesting twist on randomly assigned object identifiers”. But these are quite common. They are simply at different levels of the problem domain. OWLs, most EDMSs and other relationship maps do this all the time. Using your example, ‘Management Salary Policy’ and ‘Management Gehalts Politik’ and ‘Política del Sueldo de la Gerencia’ and ‘???????? ???????? ??????????’ would share a common identifier that acts as a meta-identifier. The reason for this is that (I assume) you are using a localization example where we have 4 different translations of the same policy. In this case we humans understand that the collection, the set, is a common set and should be related and identified with a common identifier. The difference at the set level is not meaningful. At a deeper level the language difference becomes meaningful only after the set has been identified and located. Here is where we start delving into Derridan concepts of differance and what is meant by the identity. Suffice it to say that this is the realm of extrinsicly assigned (e.g. derived to described) identification.

6) Here we’re getting down to brass tacks. I agree that search is a big pain point for most organizations. AIIM, Gartner, Forrester, Ovum, Gilbane and others all agree. But in most EDMs don’t care if multiple files with the same name are stored. I should be clear, by files with the same name I mean “file name” (e.g. BlogPost.doc or MyPresentation_Final.pptx or LinkedInReply.txt). This is because the EDMSs will store that filename as an attribute but give the file a GUID. The good EDMSs (like Oracle UCM) will provide a set identifier (ContentID) that identifies the set of revisions which may be substantially different from each other. Each revision has it’s own unique identifier (dID). Furthermore, ContentIDs can be auto generated, auto derived from rules / extraction processes or manually created each and every time. Additionally content objects can be associated with each other along N number of axes for intentional purpose-based collecting/discovering/location/identification. These axes may or may not be indexable. This allows information discovery (classification, grouping, categorization) to be brought along side information location (querying, retrieving) rather than relying simply on one or the other or requiring a serial approach of one then the other. I fundamentally disagree with your statement that, “nor is it considered best practice in an EDMS to encourage or even allow contributors to randomly assign their own metadata.” First, no user ever “randomly” assigns their own metadata. At least not in true random form. Second your statement begs the question of what is metadata and what should end users create and consume? So are Flickr tags metadata? Yes. Should users create them and be enabled to create them? YES! Are star based ratings on blog posts metadata? YES. Should users be allowed and empowered to engage with rating systems? YES! What about “comments” or “descriptions” or “due date”? I would argue that in contextually appropriate situations users should always have at least the option if not the requirement to add descriptive and intentional metadata to their content objects. The more that entity extraction systems (e.g. OpenCalais, GATE, CLARABRIDGE etc) become commonplace the easier we can make it for people. But I do not foresee a time when information creators will be able or should stop describing and classifying what they have created or consumed.

7) You write that “This issue of whether or not to have a convention for naming files is symptomatic of a systemic problem.” I agree whole heartedly. But I disagree that this means that all systems fail to solve the problem. Indeed I am very confident that with technologies like the Oracle ECM system and Fishbowl Solutions add on modules such as our Subscription notifier, Workflow solution set, CollabPoint and Advanced User Security Mapping along with our solutions like Contract Management, Policies and Procedures, Admissions Office Onboarding and Research Solutions that we can and do solve the business problems around file naming conventions, information location and efficiency boosting.

So I will echo your sentiment at the end, with which I agree unequivocally: In the spirit of the community of intelligent information management,I’m just sayin’…

Fishbowl Solutions Helps You Get Fit With Your Enterprise Information Management Strategy in 2010

On February 10, 2010 Fishbowl Solutions is hosting an Enterprise Information Management (EIM) Bootcamp in Golden Valley, Minnesota. Join author, E20 expert, and your drill sergeant, Billy Cripe, as he takes us through the evolution of Enterprise Information Management. Please register to:

  • Get Ripped as we address key information management considerations for 2010
  • Build organizational endurance as we outline must-have factors for EIM success
  • Become a Leader as you bring back a firm understanding of how to implement expert solutions

This free event will be held 8am-noon and we will even fuel you for the day with a complimentary breakfast.

You can also stick around for a special “Ask the Experts” session and pit your toughest questions against our strongest people.

Please visit the Fishbowl website to learn more, or contact Amanda to register now. We hope to see you there!

Enterprise Information Management in 2010

Surveys are great things and we all seem to love them. We inherently position ourselves when we read them, identifying ourselves as clever enough to agree with the majority or cool enough to know what the rest of them OBVIOUSLY missed. Either way we’re enchanted with surveys and results.

The real power behind them, though, is in the trends that they foreshadow. What I find interesting is that, while survey results don’t always agree, when you overlap the results some very interesting trends emerge.

One that I find particularly interesting is what I will call the evolution of ECM (enterprise content management) to EIM (enterprise information management). Many of us in the ECM industry have witnessed the slow growth of how content is managed, the capabilities that are baked in to various ECM technologies and the pervasive nature of the core ECM systems.

But the content itself is changing as well. We are no longer simply managing documents. It’s not about just scanning a paper invoice. Rather we are using digital images, videos and audio files to degrees never before imagined.

Finally, the systems we use to access information are evolving as well. Enterprises should not be content with rolling out a host of point solutions for different projects. They should not tolerate unmanaged or “organic” growth of systems that spring up over night and are orphaned just as quickly. Rather, enterprises are “smartening up” when it comes to the user experiences and information quality considerations inherent in good information architecture. This is why we see some of the trends we do. For instance, ECM at the core of EIM systems is vital and surveys say spending there will continue to grow.

According to Forrester the key areas for ECM investment in 2010 are Collaboration, Search and Compliance. AIIM predicts that Capture will be an area for ECM investment.

Forrester Results via CMS Wire

AIIM Results

Well, I think that the key trend only appears when you look at these separate analyses, together.

Why is content sharing so important to Forrester and knowledge management so important to AIIM?

The real problem companies are trying to solve is the contextual availability of information. This means providing the right information to the right people at the right time, even if they don’t know they need it. This information might have started out in paper form, or as a rating in someone’s head, or a trouble ticket in a support CRM system.

Forrester surveyed 170 ECM decision makers, while AIIM surveyed 882 individual members. Both surveys revealed that organizations are planning on spending money on ECM in the New Year. This is good news for everyone in the ECM space! While companies are not going to simply hand blank checks to ECM vendors, they will be looking for ways to leverage prior ECM license purchases and turn more of their shelf-ware into real, efficiency-boosting systems. The premium, though, is on business solutions that are able to tap the power of sophisticated technology ecosystems while providing simplicity and efficiency for the everyday users. This simplified sophistication is one key way in which collaboration technology is poised to win big this year. The ability to turn a ROI is vital.

Gartner agrees. “In 2009, CEOs initially placed cost cutting at the top of their priorities to cope with the sudden and severe recession. In 2010, the focus for 71 per cent of business leaders is a return to revenue growth.”

One example of this kind of tech-to-efficiency-to-ROI success comes from one of our financial industry customers. They set out to consolidate their information architecture while enabling employees to collaboratively interact with information and each other. Their consolidation saves them 25 percent each year on administrative costs and employees are much happier and more productive than ever before.

The frequency in which organizations are realizing they NEED to provide their people with the contextual availability of information is increasing dramatically and it seems 2010 is the year to align their sophisticated technologies with the needs of the everyday user.