rna-transcription approach: a few improvements (#4052)

petrem · web-flow · commit 6865784f3f7a · 2025-12-24T14:37:50.000-08:00
* rna-transcription approach: a few improvements

- Renamed `chr` to `char` in the snippets because `chr` is a built-in function and
  although shadowing it in this case may not be a problem, it is still a bad practice
  when it can be avoided.
- The dictionary-join approach mentions list comprehensions, but instead it uses a
  generator expression. Replaced this in the explanation and expanded to give the list
  comprehension based implementation along with a brief comparison.
- The overview mentions one approach is four times faster. In a brief comparison, it
  varies from 2.5x for a very short string and up to 60x faster for a 10^6 long one.
  Probably not worth going into the details, but 4x is just innacurate.

* convert code snippets to single quotes for consistency

* several updates following discussions

- Replaced `char` with `nucleotide` as this is terminology from the domain.
- Rephrased a "see also" link to be more screen reader friendly.
- A note about the exercise not requiring tests for invalid characters is preset in one
  of the approaches. Copied it over to the other approach, for uniformity.
- Rephrased mention about performance and speedup.
- Replaced mention of ASCII with Unicode adding a brief explanation and links.

* move note regarding testing for erroneous inputs to `introduction.md`
... because it applies to the exercise in general, not a particular approach.

Re-applying missed commits from prior cherry-pick.

* Re-applied the commits from below via cherry-pick.
convert code snippets to single quotes for consistency
diff --git a/exercises/practice/rna-transcription/.approaches/dictionary-join/content.md b/exercises/practice/rna-transcription/.approaches/dictionary-join/content.md
@@ -1,11 +1,11 @@
 # dictionary look-up with `join`
 
 ```python
-LOOKUP = {"G": "C", "C": "G", "T": "A", "A": "U"}
+LOOKUP = {'G': 'C', 'C': 'G', 'T': 'A', 'A': 'U'}
 
 
 def to_rna(dna_strand):
-    return ''.join(LOOKUP[chr] for chr in dna_strand)
+    return ''.join(LOOKUP[nucleotide] for nucleotide in dna_strand)
 
 ```
 
@@ -16,15 +16,37 @@ but the `LOOKUP` dictionary is defined with all uppercase letters, which is the
 It indicates that the value is not intended to be changed.
 
 In the `to_rna()` function, the [`join()`][join] method is called on an empty string,
-and is passed the list created from a [list comprehension][list-comprehension].
+and is passed the list created from a [generator expression][generator-expression].
 
-The list comprehension iterates each character in the input,
+The generator expression iterates each character in the input,
 looks up the DNA character in the look-up dictionary, and outputs its matching RNA character as an element in the list.
 
-The `join()` method collects the list of RNA characters back into a string.
+The `join()` method collects the RNA characters back into a string.
 Since an empty string is the separator for the `join()`, there are no spaces between the RNA characters in the string.
 
+A generator expression is similar to a [list comprehension][list-comprehension], but instead of creating a list, it returns a generator, and iterating that generator yields the elements on the fly.
+
+A variant that uses a list comprehension is almost identical, but note the additional square brackets inside the `join()`:
+
+```python
+LOOKUP = {'G': 'C', 'C': 'G', 'T': 'A', 'A': 'U'}
+
+def to_rna(dna_strand):
+    return ''.join([LOOKUP[nucleotide] for nucleotide in dna_strand])
+```
+
+
+For a relatively small number of elements, using lists is fine and may be faster, but as the number of elements increases, the memory consumption increases and performance decreases.
+You can read more about [when to choose generators over list comprehensions][list-comprehension-choose-generator-expression] to dig deeper into the topic.
+
+
+~~~~exercism/note
+As of this writing, no invalid DNA characters are in the argument to `to_rna()`, so there is no error handling required for invalid input.
+~~~~
+
 [dictionaries]: https://docs.python.org/3/tutorial/datastructures.html?#dictionaries
 [const]: https://realpython.com/python-constants/
 [join]: https://docs.python.org/3/library/stdtypes.html?#str.join
 [list-comprehension]: https://realpython.com/list-comprehension-python/#using-list-comprehensions
+[list-comprehension-choose-generator-expression]: https://realpython.com/list-comprehension-python/#choose-generators-for-large-datasets
+[generator-expression]: https://realpython.com/introduction-to-python-generators/#building-generators-with-generator-expressions
diff --git a/exercises/practice/rna-transcription/.approaches/dictionary-join/snippet.txt b/exercises/practice/rna-transcription/.approaches/dictionary-join/snippet.txt
@@ -1,5 +1,5 @@
-LOOKUP = {"G": "C", "C": "G", "T": "A", "A": "U"}
+LOOKUP = {'G': 'C', 'C': 'G', 'T': 'A', 'A': 'U'}
 
 
 def to_rna(dna_strand):
-    return ''.join(LOOKUP[chr] for chr in dna_strand)
+    return ''.join(LOOKUP[nucleotide] for nucleotide in dna_strand)
diff --git a/exercises/practice/rna-transcription/.approaches/introduction.md b/exercises/practice/rna-transcription/.approaches/introduction.md
@@ -7,13 +7,13 @@ Another approach is to do a dictionary lookup on each character and join the res
 ## General guidance
 
 Whichever approach is used needs to return the RNA complement for each DNA value.
-The `translate()` method with `maketrans()` transcribes using the [ASCII][ASCII] values of the characters.
+The `translate()` method with `maketrans()` transcribes using the [Unicode][Unicode] code points of the characters.
 Using a dictionary look-up with `join()` transcribes using the string values of the characters.
 
 ## Approach: `translate()` with `maketrans()`
 
 ```python
-LOOKUP = str.maketrans("GCTA", "CGAU")
+LOOKUP = str.maketrans('GCTA', 'CGAU')
 
 
 def to_rna(dna_strand):
@@ -26,20 +26,26 @@ For more information, check the [`translate()` with `maketrans()` approach][appr
 ## Approach: dictionary look-up with `join()`
 
 ```python
-LOOKUP = {"G": "C", "C": "G", "T": "A", "A": "U"}
+LOOKUP = {'G': 'C', 'C': 'G', 'T': 'A', 'A': 'U'}
 
 
 def to_rna(dna_strand):
-    return ''.join(LOOKUP[chr] for chr in dna_strand)
+    return ''.join(LOOKUP[nucleotide] for nucleotide in dna_strand)
 
 ```
 
 For more information, check the [dictionary look-up with `join()` approach][approach-dictionary-join].
 
 ## Which approach to use?
 
-The `translate()` with `maketrans()` approach benchmarked over four times faster than the dictionary look-up with `join()` approach.
+If performance matters, consider using the [`translate()` with `maketrans()` approach][approach-translate-maketrans].
+How an implementation behaves in terms of performance may depend on the actual data being processed, on hardware, and other factors.
 
-[ASCII]: https://www.asciitable.com/
+
+~~~~exercism/note
+As of this writing, no invalid DNA characters are in the argument to `to_rna()`, so there is no error handling required for invalid input.
+~~~~
+
+[Unicode]: https://en.wikipedia.org/wiki/Unicode
 [approach-translate-maketrans]: https://exercism.org/tracks/python/exercises/rna-transcription/approaches/translate-maketrans
 [approach-dictionary-join]: https://exercism.org/tracks/python/exercises/rna-transcription/approaches/dictionary-join
diff --git a/exercises/practice/rna-transcription/.approaches/translate-maketrans/content.md b/exercises/practice/rna-transcription/.approaches/translate-maketrans/content.md
@@ -1,7 +1,7 @@
 # `translate()` with `maketrans()`
 
 ```python
-LOOKUP = str.maketrans("GCTA", "CGAU")
+LOOKUP = str.maketrans('GCTA', 'CGAU')
 
 
 def to_rna(dna_strand):
@@ -15,20 +15,21 @@ Python doesn't _enforce_ having real constant values,
 but the `LOOKUP` translation table is defined with all uppercase letters, which is the naming convention for a Python [constant][const].
 It indicates that the value is not intended to be changed.
 
-The translation table that is created uses the [ASCII][ASCII] values (also called the ordinal values) for each letter in the two strings.
-The ASCII value for "G" in the first string is the key for the ASCII value of "C" in the second string, and so on.
+The translation table that is created uses the [Unicode][Unicode] _code points_ (sometimes called the ordinal values) for each letter in the two strings.
+As Unicode was designed to be backwards compatible with [ASCII][ASCII] and because the exercise uses Latin letters, the code points in the translation table can be interpreted as ASCII.
+However, the functions can deal with any Unicode character.
+You can learn more by reading about [strings and their representation in the Exercism Python syllabus][concept-string].
+
+The Unicode value for "G" in the first string is the key for the Unicode value of "C" in the second string, and so on.
 
 In the `to_rna()` function, the [`translate()`][translate] method is called on the input,
 and is passed the translation table.
 The output of `translate()` is a string where all of the input DNA characters have been replaced by their RNA complement in the translation table.
 
-
-~~~~exercism/note
-As of this writing, no invalid DNA characters are in the argument to `to_rna()`, so there is no error handling required for invalid input.
-~~~~
-
 [dictionaries]: https://docs.python.org/3/tutorial/datastructures.html?#dictionaries
 [maketrans]: https://docs.python.org/3/library/stdtypes.html?#str.maketrans
 [const]: https://realpython.com/python-constants/
 [translate]: https://docs.python.org/3/library/stdtypes.html?#str.translate
 [ASCII]: https://www.asciitable.com/
+[Unicode]: https://en.wikipedia.org/wiki/Unicode
+[concept-strings]: https://exercism.org/tracks/python/concepts/strings
diff --git a/exercises/practice/rna-transcription/.approaches/translate-maketrans/snippet.txt b/exercises/practice/rna-transcription/.approaches/translate-maketrans/snippet.txt
@@ -1,4 +1,4 @@
-LOOKUP = str.maketrans("GCTA", "CGAU")
+LOOKUP = str.maketrans('GCTA', 'CGAU')
 
 
 def to_rna(dna_strand):

Original file line number	Diff line number	Diff line change
`@@ -1,4 +1,4 @@`
`1`		`-LOOKUP = str.maketrans("GCTA", "CGAU")`
	`1`	`+LOOKUP = str.maketrans('GCTA', 'CGAU')`
`2`	`2`
`3`	`3`
`4`	`4`	`def to_rna(dna_strand):`