UML软件工程组织

数据建模最佳方法
(选自时代朝阳网)

If you ask an application developer what the most important task is in developing new or enhanced applications for institutional data and processes, almost every time they will tell you it is the initial analysis of client requirements. Before purchasing any software, before storing a single byte of data in a database, analysis of the client's requirements is paramount to developing the appropriate solution. More time spent in analysis directly increases the effectiveness of the resulting application. Since the early 1960s, and despite the waves of change since then, one thing has remained constant -- the initial analysis is still the most important activity that an application designer undertakes. It gives the developer the chance to design an effective, spectacular application, no holds barred.

This analysis takes on various forms. Usually the application developer has a feeling about what form the analysis should take. It may simply require a phone call to the client asking them "Do you want to add or subtract 5 percent from all the employees' salaries?" Or, it may require the organization of week-long meetings with clients to collectively analyze their requirements. Overkill is rarely a problem in the analysis stage as it guarantees the involvement of all the relevant people. The worst thing a developer can do is to not include a key person in the requirements analysis. Everyone's knowledge and experience is needed during this analysis. Their presence or absence makes or breaks the success of the analysis.

The participants in the analysis bring their much-needed knowledge and experience into the meeting, but it is also important to ask them to "leave their baggage at the door." Excess baggage such as idealization of the features or constraints of the current application can impede the design of a new and improved application, one without those same "time-honored" constraints. While the developer recognizes that there are always rules, regulations, and constraints, they must also examine these constraints for their continuing validity within the new application.

Data Modeling

Most people involved in application development follow some kind of methodology. A methodology is a prescribed set of processes through which the developer analyzes the client's requirements and develops an application. Major database vendors and computer gurus all practice and promote their own methodology. Some database vendors even make their analysis, design, and development tools conform to a particular methodology. If you are using the tools of a particular vendor, it may be easier to follow their methodology as well. For example, when CNS develops and supports Oracle database applications it uses the Oracle toolset. Accordingly, CNS follows Oracle's CASE*Method application development methodology (or a reasonable facsimile thereof).

One technique commonly used in analyzing the client's requirements is data modeling. The purpose of data modeling is to develop an accurate model, or graphical representation, of the client's information needs and business processes. The data model acts as a framework for the development of the new or enhanced application. There are almost as many methods of data modeling as there are application development methodologies. CNS uses the Oracle CASE*Method for its data modeling.

As time goes by, applications tend to accrue new layers, just like an onion. We develop more paper pushing and report printing, adding new layers of functions and features. Soon it gets to the point where we can only see with difficulty the core of the application where its essence lies. Around the core of the application we see layer upon layer, protecting, nurturing, but ultimately obscuring the core. Our systems and applications often fall victim to these protective or hiding processes. The essence of an application is lost in the shuffle of paper and the accretion of day-to-day changes. Data modeling encourages both the developer and the client to tear off these excess layers, to explore and revisit the essence or purpose of the application once more. The new analysis determines what needs to feed into and what feeds from the core purpose.

Application Audience and Services

After participants at CNS-sponsored application analysis meetings agree on a scope and objectives statement, we find it helpful to identify the audience of the application. To whom do you offer the services we are modeling? Who is affected by the application? Answers to these and similar questions help the participants stay in focus with the desired application results.

After assembling an audience list, we then develop a list of services provided by the application. This list includes the services of the existing application and any desired future services in the new application. From this list, we model the information requirements of each service. To do this, it is useful to first identify the three most important services of the application, and then of those three, the single most important service. Eventually all of the services will be modeled. Focusing our data modeling on one service just gives us a starting point.

Entities

The next step in modeling a service or process, is to identify the entities involved in that process. An entity is a thing or object of significance to the business, whether real or imagined, about which the business must collect and maintain data, or about which information needs to be known or held. An entity may be a tangible or real object like a person or a building; it may be an activity like an appointment or an operation; it may be conceptual as in a cost center or an organizational unit.

Whatever is chosen as an entity must be described in real terms. It must be uniquely identifiable. That is, each instance or occurrence of an entity must be separate and distinctly identifiable from all other instances of that type of entity.

For example, if we were designing a computerized application for the care of plants in a greenhouse, one of its processes might be tracking plant waterings. Within that process, there are two entities: the Plant entity and the Watering entity. A Plant has significance as a living flora of beauty. Each Plant is uniquely identified by its biological name, or some other unique reference to it. A Watering has significance as an application of water to a plant. Each Watering is uniquely identified by the date and time of its application.

Attributes

After you identify an entity, then you describe it in real terms, or through its attributes. An attribute is any detail that serves to identify, qualify, classify, quantify, or otherwise express the state of an entity occurrence or a relationship. Attributes are specific pieces of information which need to be known or held.

An attribute is either required or optional. When it's required, we must have a value for it, a value must be known for each entity occurrence. When it's optional, we could have a value for it, a value may be known for each entity occurrence. For example, some attributes for Plant are: description, date of acquisition, flowering or non-flowering, and pot size. The description is required for every Plant. The pot size is optional since some plants do not come in pots. Again, some of Watering's attributes are: date and time of application, amount of water, and water temperature. The date and time are required for every Watering. The water temperature is optional since we do not always check it before watering some plants.

The attributes reflect the need for the information they provide. In the analysis meeting, the participants should list as many attributes as possible. Later they can weed out those that are not applicable to the application, or those the client is not prepared to spend the resources on to collect and maintain. The participants come to an agreement on which attributes belong with an entity, as well as which attributes are required or optional.

The attributes which uniquely define an occurrence of an entity are called primary keys. If such an attribute doesn't exist naturally, a new attribute is defined for that purpose, for example an ID number or code.

Relationships

After two or more entities are identified and defined with attributes, the participants determine if a relationship exists between the entities. A relationship is any association, linkage, or connection between the entities of interest to the business; it is a two-directional, significant association between two entities, or between an entity and itself. Each relationship has a name, an optionality (optional or mandatory), and a degree (how many). A relationship is described in real terms.

Rarely will there be a relationship between every entity and every other entity in an application. If there are only two or three entities, then perhaps there will be relationships between them all. In a larger application, there are not always relationships between one entity and all of the others.

Assigning a name, an optionality, and a degree to a relationship helps confirm the validity of that relationship. If you cannot give a relationship all these things, then perhaps there really is no relationship at all. For example, there is a relationship between Plant and Watering. Each Plant may be given one or more Waterings. Each Watering must be for one and only one specific Plant.

Entity Relationship Diagrams

To visually record the entities and the relationships between them, an entity relationship diagram, or ERD, is drawn. An ERD is a pictorial representation of the entities and the relationships between them. It allows the participants in the meeting to easily see the information structure of the application. Later, the project team uses the ERD to design the database and tables. Knowing how to read an ERD is very important. If there are any mistakes or relationships missing, the application will fail in that respect. Although somewhat cryptic, learning to read an ERD comes quickly.

Each entity is drawn in a box. Each relationship is drawn as a line between entities. The relationship between Plant and Watering is drawn on the ERD as follows:

 

Since a relationship is between two entities, an ERD shows how one entity relates to the other, and vice versa. Reading an ERD relationship means you have to read it from one entity to the other, and then from the other to the first. Each style and mark on the relationship line has some significance to the relationship and its reading. Half the relationship line belongs to the entity on that side of the line. The other half belongs to the other entity on the other side of the line.

When you read a relationship, start with one entity and note the line style starting at that entity. Ignore the latter half of the line's style, since it's there for you to come back the other way. A solid line at an entity represents a mandatory relationship. In the example above, each Watering must be for one and only one Plant. A dotted line at an entity represents an optional relationship. Each Plant may be given one or more Waterings.

The way in which the relationship line connects to an entity is significant. If it connects with a single line, it represents one and only one occurrence of that entity. In the example, each Watering must be for one and only one Plant. If the relationship line connects with three prongs, i.e., a crowsfoot, it represents one or more of the entities. Each Plant may be given one or more Waterings. As long as both statements are true, then you know you have modeled the relationship properly.

In the relationship between Plant and Watering, there are two relationship statements. One is: each Watering must be for one and only one Plant. These are the parts of the ERD which that statement uses:

 

The second statement is: each Plant may be given one or more Waterings. The parts of the ERD which that statement uses are:

 

After some experience, you learn to ask the appropriate questions to determine if two entities are related to each other, and the degree of that relationship. After agreeing on the entities and their relationships, the process of identifying more entities, describing them, and determining their relationships continues until all of the services of the application have been examined. The data model remains software and hardware independent.

Many-to-Many Relationships

There are different types of relationships. The greenhouse plant application example showed a one-to-many and a many-to-one relationship, both between Plant and Watering. Two other relationships commonly found in data models are one-to-one and many-to-many. One-to-one relationships are between two entities where both are related to each other, once and only once for each instance of either. In a many-to-many relationship, multiple occurrences of one entity are related to one occurrence of another, and vice versa.

An example of a many-to-many relationship in the greenhouse plant application is between the Plant and Additive entities. Each plant may be treated with one or more Additives. Each Additive may be given to one or more Plants. The ERD for this relationship is shown below.

 

Many-to-many relationships cannot be directly converted into database tables and relationships. This is a restriction of the database systems, not of the application. The development team has to resolve the many-to-many relationship before it can continue with the database development. If you identify a many-to-many relationship in your analysis meeting, you should try to resolve it in the meeting. The participants can usually find a fitting entity to provide the resolution.

To resolve a many-to-many relationship means to convert it into two one-to-many, many-to-one relationships. A new entity comes between the two original entities, and this new entity is referred to as an intersection entity. It allows for every possible matched occurrence of the two entities. Sometimes the intersection entity represents a point or passage in time.

The Plant-Additive many-to-many relationship above is resolved in the following ERD diagram:

 

With these new relationships, Plant is now related to Treatment. Each Plant may be given one or more Treatments. Each Treatment must be given to one and only one Plant. Additive is also related to Treatment. Each Additive may be used in one or more Treatments. Each Treatment must be comprised of one and only one Additive. With these two new relationships, Treatment cannot exist without Plant and Additive. Treatment can occur multiple times, once for each treatment of a plant additive. To keep each Treatment unique, a new attribute is defined. Treatment now has application date and time attributes. They are the unique identifiers or the primary key of Treatment. Other attributes of Treatment are quantity and potency of the additive.

Will Data Modeling Look Good on You?

There are other processes and marks to enhance a data model besides the ones shown in this article. Many of them are used in the actual development of the database tables. The techniques shown here only provide a basic foundation for undertaking your own data modeling analysis.

Data modeling gives you the opportunity to shed the layers of processes covering up the fundamental essence of your business. Remember to leave your baggage at the door of a data modeling session. Come to the meeting with enthusiasm and a positive outlook for a new and improved application.


版权所有:UML软件工程组织